First, a little terminology/concepts.
According to the documentation, one iterator (or "iterator") is any object that implements the methods __next__ and __iter__ (and the latter must return the object itself, while the first returns the next item of the iterator, or throws a StopIteration when there are no more elements).
Already one Generator (or "generator") is a function (and therefore is also called Generator Function) that returns a iterator Generator (that the documentation defines as "An object created by a Generator Function"). One Generator Function is created using the yield, which is already explained in detail here and here.
And there is also the Generator Expression, which is an expression that returns a Generator (like the one in the question: (x**2 for x in range(50))).
Every object iterator Generator (be created by a Generator Function or a Generator Expression) is also a iterator. But not all iterator is a Generator.
A Generator is Lazy in the sense that it only returns the next value if you call the method __next__ (either explicitly through the builtin next, or implicitly, when you walk through it with a for for example). Internally it stores its state (basically, "where it stopped" and the value of each variable at that point; read the link already indicated for more details), so that on the next call to next he can continue the execution from where he left off.
In general a Generator does not keep all values at once in memory, generating them on demand. But nothing prevents you from doing this:
# itera pelos elementos do iterável uma ou mais vezes
def ciclo(iteravel, repetir=1):
# guarda os valores do iterável em uma lista
valores = list(iteravel)
for _ in range(repetir):
for n in valores:
yield n
# imprime duas vezes os valores da generator expression
for n in ciclo((x ** 2 for x in range(3)), 2):
print(n)
That is, the Generator Function ciclo creates a Generator iterates several times by the specified iterable. Values are stored in a list because the iteravel could be another Generator (which can only be iterated once), so by storing the values in a list I guarantee that I can iterate several times without problems. In the example above, if I did the second for directly on iteravel, values would only be returned once, since the Generator Expression creates a Generator, and this can only be iterated once.
I mean, I created a Generator that keeps all the values in memory at once (just out of curiosity, that’s exactly what the function itertools.cycle ago).
Of course that’s a corner case, and in general, the generators do not store all values in memory (unless I force this, as I did above), computing them only when necessary. Already a iterator that is not a Generator, not necessarily will be Lazy.
When you do iter([1, 2, 3]), he returns a iterator, that is not a Generator, as can be verified with the module inspect:
import inspect
# iter cria um iterator que não é generator
x = iter([1, 30, 99, 42])
print(inspect.isgenerator(x)) # False
# generator expression cria um generator
x = (x * 2 for x in range(10))
print(inspect.isgenerator(x)) # True
In addition, the list [1, 2, 3] has already been created and internally iterator keeps a reference to it. What the iterator does is iterate to the next element when the method __next__ is called, but as the list has already been created, you do not have the "memory saving". A iterator would only save memory if its values were computed on demand, as the example of Generator above. For example:
# iterator que gera os valores sob demanda
class Squares(object):
def __init__(self, start, stop):
self.start = start
self.stop = stop
def __iter__(self): return self
# o próximo elemento é calculado somente se next for chamado
def __next__(self):
if self.start >= self.stop:
raise StopIteration
current = self.start * self.start
self.start += 1
return current
iterator = Squares(3, 10)
print(next(iterator)) # 9
print(next(iterator)) # 16
If I use inspect.isgenerator in the iterator above, the result will be False.
But if you create one iterator based on the values of a list, and this list has already been created, there is no saving at all. So your question "how he can rescue the items without them being saved in memory?" part of a wrong premise, because the values are in the memory, and this is how the iterator can retrieve them.
Going a little deeper, I went to take a look at source code of Cpython (which is the language reference implementation, and most likely the one you are using). I consulted the code today (11/jun/2021):
The builtin iter is defined here:
static PyObject *
builtin_iter(PyObject *self, PyObject *const *args, Py_ssize_t nargs)
{
PyObject *v;
if (!_PyArg_CheckPositional("iter", nargs, 1, 2))
return NULL;
v = args[0];
if (nargs == 1)
return PyObject_GetIter(v);
if (!PyCallable_Check(v)) {
PyErr_SetString(PyExc_TypeError,
"iter(v, w): v must be callable");
return NULL;
}
PyObject *sentinel = args[1];
return PyCallIter_New(v, sentinel);
}
In this case, when we pass only one argument (for example, a list), it falls into if (nargs == 1) and flame PyObject_GetIter, which in turn is defined here:
PyObject *
PyObject_GetIter(PyObject *o)
{
PyTypeObject *t = Py_TYPE(o);
getiterfunc f;
f = t->tp_iter;
if (f == NULL) {
if (PySequence_Check(o))
return PySeqIter_New(o);
return type_error("'%.200s' object is not iterable", o);
}
else {
PyObject *res = (*f)(o);
if (res != NULL && !PyIter_Check(res)) {
PyErr_Format(PyExc_TypeError,
"iter() returned non-iterator "
"of type '%.100s'",
Py_TYPE(res)->tp_name);
Py_DECREF(res);
res = NULL;
}
return res;
}
}
In the case, o is the list, whose Py_TYPE(o) is PyList_Type, which in turn is defined here:
PyTypeObject PyList_Type = {
// um monte de linhas...
list_iter, /* tp_iter */
// mais um monte de linhas...
See that in the field tp_iter the value is the function list_iter. Since this function is not null, the function PyObject_GetIter fell in the else and calls the function list_iter, passing the list as argument. And this function (defined in same file that defines the PyList_Type) does the following:
static PyObject *
list_iter(PyObject *seq)
{
listiterobject *it;
if (!PyList_Check(seq)) {
PyErr_BadInternalCall();
return NULL;
}
it = PyObject_GC_New(listiterobject, &PyListIter_Type);
if (it == NULL)
return NULL;
it->it_index = 0;
Py_INCREF(seq);
it->it_seq = (PyListObject *)seq;
_PyObject_GC_TRACK(it);
return (PyObject *)it;
}
And it’s on the line it->it_seq = (PyListObject *)seq; that we see that the list (seq) is assigned to the field it_seq of iterator. That is, the iterator keeps a reference to the list. Definition of the listiterobject (also in the same file that defines the list_iter):
typedef struct {
PyObject_HEAD
Py_ssize_t it_index;
PyListObject *it_seq; /* Set to NULL when iterator is exhausted */
} listiterobject;
And the implementation of the method next (also in the same file that defines the list_iter):
static PyObject *
listiter_next(listiterobject *it)
{
PyListObject *seq;
PyObject *item;
assert(it != NULL);
seq = it->it_seq;
if (seq == NULL)
return NULL;
assert(PyList_Check(seq));
if (it->it_index < PyList_GET_SIZE(seq)) {
item = PyList_GET_ITEM(seq, it->it_index);
++it->it_index;
Py_INCREF(item);
return item;
}
it->it_seq = NULL;
Py_DECREF(seq);
return NULL;
}
Which shows that the iterator going through the elements of the list one by one.
That is, the iterator keeps a reference from the list, and this is how it manages to return its elements. As the list has already been created, there is no generation Lazy (on demand), as is the case with generators (or with the iterator Squares from the example above), because the values were already created earlier when you created the list.
I don’t know them at all Internals of Python, but both
iterwhen generators are to store the values (their status) in memory. In the case ofiter, you passed a list (because it is delimited by[ ]), and probably creates an object that keeps a reference to this list. I can’t imagine any other way for him to be able to iterate for the values, other than keeping them internally as part of the created object.– hkotsubo
It’s the only way I can imagine, too. but in this case the values would be being saved in memory, which seems to me a little contradictory, because if I understood the concept well, the iterators do not store all the data in memory, and so are lighter and faster than lists, for example.
– Fernando
But when you do
iter([1, 2, 3]), the list[1, 2, 3]has already been created, then there is nothing to save there, because the memory has already been "worn"... There is nothing contradictory– hkotsubo
I poked around in the source code of Cpython and put an answer giving more details...
– hkotsubo
TL;DR: when you pass a list to iter, the listta iterator keeps a reference to it. Then it is only destroyed when the iterator is also (when it goes out of scope, or with a
del). There are no surprises there: just keep the same rule always: an object exists while it has references to it.– jsbueno