First, a little terminology/concepts.
According to the documentation, one iterator (or "iterator") is any object that implements the methods __next__
and __iter__
(and the latter must return the object itself, while the first returns the next item of the iterator, or throws a StopIteration
when there are no more elements).
Already one Generator (or "generator") is a function (and therefore is also called Generator Function) that returns a iterator Generator (that the documentation defines as "An object created by a Generator Function"). One Generator Function is created using the yield
, which is already explained in detail here and here.
And there is also the Generator Expression, which is an expression that returns a Generator (like the one in the question: (x**2 for x in range(50))
).
Every object iterator Generator (be created by a Generator Function or a Generator Expression) is also a iterator. But not all iterator is a Generator.
A Generator is Lazy in the sense that it only returns the next value if you call the method __next__
(either explicitly through the builtin next
, or implicitly, when you walk through it with a for
for example). Internally it stores its state (basically, "where it stopped" and the value of each variable at that point; read the link already indicated for more details), so that on the next call to next
he can continue the execution from where he left off.
In general a Generator does not keep all values at once in memory, generating them on demand. But nothing prevents you from doing this:
# itera pelos elementos do iterável uma ou mais vezes
def ciclo(iteravel, repetir=1):
# guarda os valores do iterável em uma lista
valores = list(iteravel)
for _ in range(repetir):
for n in valores:
yield n
# imprime duas vezes os valores da generator expression
for n in ciclo((x ** 2 for x in range(3)), 2):
print(n)
That is, the Generator Function ciclo
creates a Generator iterates several times by the specified iterable. Values are stored in a list because the iteravel
could be another Generator (which can only be iterated once), so by storing the values in a list I guarantee that I can iterate several times without problems. In the example above, if I did the second for
directly on iteravel
, values would only be returned once, since the Generator Expression creates a Generator, and this can only be iterated once.
I mean, I created a Generator that keeps all the values in memory at once (just out of curiosity, that’s exactly what the function itertools.cycle
ago).
Of course that’s a corner case, and in general, the generators do not store all values in memory (unless I force this, as I did above), computing them only when necessary. Already a iterator that is not a Generator, not necessarily will be Lazy.
When you do iter([1, 2, 3])
, he returns a iterator, that is not a Generator, as can be verified with the module inspect
:
import inspect
# iter cria um iterator que não é generator
x = iter([1, 30, 99, 42])
print(inspect.isgenerator(x)) # False
# generator expression cria um generator
x = (x * 2 for x in range(10))
print(inspect.isgenerator(x)) # True
In addition, the list [1, 2, 3]
has already been created and internally iterator keeps a reference to it. What the iterator does is iterate to the next element when the method __next__
is called, but as the list has already been created, you do not have the "memory saving". A iterator would only save memory if its values were computed on demand, as the example of Generator above. For example:
# iterator que gera os valores sob demanda
class Squares(object):
def __init__(self, start, stop):
self.start = start
self.stop = stop
def __iter__(self): return self
# o próximo elemento é calculado somente se next for chamado
def __next__(self):
if self.start >= self.stop:
raise StopIteration
current = self.start * self.start
self.start += 1
return current
iterator = Squares(3, 10)
print(next(iterator)) # 9
print(next(iterator)) # 16
If I use inspect.isgenerator
in the iterator above, the result will be False
.
But if you create one iterator based on the values of a list, and this list has already been created, there is no saving at all. So your question "how he can rescue the items without them being saved in memory?" part of a wrong premise, because the values are in the memory, and this is how the iterator can retrieve them.
Going a little deeper, I went to take a look at source code of Cpython (which is the language reference implementation, and most likely the one you are using). I consulted the code today (11/jun/2021):
The builtin iter
is defined here:
static PyObject *
builtin_iter(PyObject *self, PyObject *const *args, Py_ssize_t nargs)
{
PyObject *v;
if (!_PyArg_CheckPositional("iter", nargs, 1, 2))
return NULL;
v = args[0];
if (nargs == 1)
return PyObject_GetIter(v);
if (!PyCallable_Check(v)) {
PyErr_SetString(PyExc_TypeError,
"iter(v, w): v must be callable");
return NULL;
}
PyObject *sentinel = args[1];
return PyCallIter_New(v, sentinel);
}
In this case, when we pass only one argument (for example, a list), it falls into if (nargs == 1)
and flame PyObject_GetIter
, which in turn is defined here:
PyObject *
PyObject_GetIter(PyObject *o)
{
PyTypeObject *t = Py_TYPE(o);
getiterfunc f;
f = t->tp_iter;
if (f == NULL) {
if (PySequence_Check(o))
return PySeqIter_New(o);
return type_error("'%.200s' object is not iterable", o);
}
else {
PyObject *res = (*f)(o);
if (res != NULL && !PyIter_Check(res)) {
PyErr_Format(PyExc_TypeError,
"iter() returned non-iterator "
"of type '%.100s'",
Py_TYPE(res)->tp_name);
Py_DECREF(res);
res = NULL;
}
return res;
}
}
In the case, o
is the list, whose Py_TYPE(o)
is PyList_Type
, which in turn is defined here:
PyTypeObject PyList_Type = {
// um monte de linhas...
list_iter, /* tp_iter */
// mais um monte de linhas...
See that in the field tp_iter
the value is the function list_iter
. Since this function is not null, the function PyObject_GetIter
fell in the else
and calls the function list_iter
, passing the list as argument. And this function (defined in same file that defines the PyList_Type
) does the following:
static PyObject *
list_iter(PyObject *seq)
{
listiterobject *it;
if (!PyList_Check(seq)) {
PyErr_BadInternalCall();
return NULL;
}
it = PyObject_GC_New(listiterobject, &PyListIter_Type);
if (it == NULL)
return NULL;
it->it_index = 0;
Py_INCREF(seq);
it->it_seq = (PyListObject *)seq;
_PyObject_GC_TRACK(it);
return (PyObject *)it;
}
And it’s on the line it->it_seq = (PyListObject *)seq;
that we see that the list (seq
) is assigned to the field it_seq
of iterator. That is, the iterator keeps a reference to the list. Definition of the listiterobject
(also in the same file that defines the list_iter
):
typedef struct {
PyObject_HEAD
Py_ssize_t it_index;
PyListObject *it_seq; /* Set to NULL when iterator is exhausted */
} listiterobject;
And the implementation of the method next
(also in the same file that defines the list_iter
):
static PyObject *
listiter_next(listiterobject *it)
{
PyListObject *seq;
PyObject *item;
assert(it != NULL);
seq = it->it_seq;
if (seq == NULL)
return NULL;
assert(PyList_Check(seq));
if (it->it_index < PyList_GET_SIZE(seq)) {
item = PyList_GET_ITEM(seq, it->it_index);
++it->it_index;
Py_INCREF(item);
return item;
}
it->it_seq = NULL;
Py_DECREF(seq);
return NULL;
}
Which shows that the iterator going through the elements of the list one by one.
That is, the iterator keeps a reference from the list, and this is how it manages to return its elements. As the list has already been created, there is no generation Lazy (on demand), as is the case with generators (or with the iterator Squares
from the example above), because the values were already created earlier when you created the list.
I don’t know them at all Internals of Python, but both
iter
when generators are to store the values (their status) in memory. In the case ofiter
, you passed a list (because it is delimited by[ ]
), and probably creates an object that keeps a reference to this list. I can’t imagine any other way for him to be able to iterate for the values, other than keeping them internally as part of the created object.– hkotsubo
It’s the only way I can imagine, too. but in this case the values would be being saved in memory, which seems to me a little contradictory, because if I understood the concept well, the iterators do not store all the data in memory, and so are lighter and faster than lists, for example.
– Fernando
But when you do
iter([1, 2, 3])
, the list[1, 2, 3]
has already been created, then there is nothing to save there, because the memory has already been "worn"... There is nothing contradictory– hkotsubo
I poked around in the source code of Cpython and put an answer giving more details...
– hkotsubo
TL;DR: when you pass a list to iter, the listta iterator keeps a reference to it. Then it is only destroyed when the iterator is also (when it goes out of scope, or with a
del
). There are no surprises there: just keep the same rule always: an object exists while it has references to it.– jsbueno