Kent's Korner
by Kent S Johnson

2007-04-23 19:32:34

Iterators and generators

What is an iterator?

Though you may not know it, you have probably used iterators many times in your Python programs. When you iterate over a sequence using a for loop, under the hood Python is using an iterator.

The loop

for x in seq:
  f(x)

is roughly equivalent to this code:

it = iter(seq):
while True:
  try:
    x = it.next()
    f(x)
  except StopIteration:
    break

The first thing to notice is the iter() function. When iter() is passed an iterable argument, it returns an iterator.

What is an iterable argument? Anything that produces an iterator when passed to iter() :-). Lists, dicts, tuples and files are all iterable, and you can define your own iterable objects as well.

What is an iterator? Almost anything with a next() method. Calling next() on an iterator gets the next value from the iterator, or raises a StopIteration exception.

Making it explicit

Explicitly creating the iterator gives you more control over the loop. You have access to the next() method and can advance the iterator yourself. For example, suppose you want to process every other item in a sequence. Here is one way to do it:

it = iter(seq)
for x in it:
  process(x)
  it.next()

This technique is very useful for writing simple parsers that need to process input in groups. Many simple file formats contain multi-line records that can be processed with this technique. For example suppose you have a file containing alternating lines of first name, last name and you would like to print out the names, one line per name. This will do it:

f = open('names.txt')
for line in f:
  first = line.strip()
  last = f.next().strip()
  print first, last

Note that the file object is its own iterator, there is no need to call iter() on it.

Writing your own iterators

It's not too hard to write your own iterator. An iterator is an instance of a class that implements two methods, __iter__() and next(). __iter__() should return self - an iterator is its own iterator - and next() returns the next item in the sequence, or raises StopIteration. I could give a simple example, but I will just refer you to PEP234 because a much easier way to write an iterator is to use...

Generator functions

A generator function is a handy way to create an iterator that saves state between values. For example here is a generator function that yields every other value from an input sequence. This is a generalization of the first example above:

def skipper(seq):
  it = iter(seq)
  for x in it:
    yield x
    it.next()

To use a generator function, call the function and iterate over the returned value. The return value is an iterator which can be used in a for loop. For example:

for x in skipper(range(10)):
  print x

will print out every other number in the input range.

How does this work? Calling the generator function does not return a value in the usual way; instead it produces an generator object, a kind of iterator. The presence of the yield keyword triggers this behaviour. The use of yield also determines the behaviour of the generator; when execution comes to the yield, the generator produces a value in its output sequence. The value is returned from a call to next(). The next call to next() resumes execution of the generator from the yield. The internal state of the generator - the values of all local variables and the location of the yield - are saved and restored across each yield.

Using generator functions

Generator functions are extremely useful when the logic for generating a sequence is complex or where the simplest implementation of the generator is recursive. For example os.walk() is a generator function and tokenizers can be implemented using generators.

The advantage of using a generator is that the complex sequence logic is encapsulated and separated from the rest of the loop code. Notice in the example above how the logic and state for creating the sequence is separated from the logic and state of the loop that uses the sequence. This can dramatically simplify a loop. If you have stateful logic in the rest of the loop, or you are generating multiple complex sequences, or you want to reuse a complex sequence, generators are a life saver. For example, using os.walk() requires only a simple loop:

for dirname, dirs, files in os.walk(top):
  # do something with dirs or files

Compare this to os.path.walk(), which predates the use of generators. os.path.walk() takes a callback function, making it significantly more difficult to understand and use than os.walk().

Note: If your generator calls another generator (maybe itself) and wants to return the sequence of the sub-generator, you must explicitly loop over the sub-generator and yield its values. There is no shortcut for this. For example, here is the recusive inner loop from os.walk():

for name in dirs:
    path = join(top, name)
    if not islink(path):
        for x in walk(path, topdown, onerror):
            yield x

Iteration with a sentinel

A second form of iter() takes two arguments - a callable object and a sentinel value. The iterator will call the object repeatedly, yielding the result of the call until the sentinel value is reached, then raise StopIteration. For example, a file iterator equivalent to the one supplied by the file itself can be written as iter(f.readline, ''). To iterate over the results set of a DB-API cursor, use iter(cursor.fetchone, None). Notice that there are no parentheses after the function names! The function itself is passed to iter().

Notes on advanced topics

You can have more than one yield in a generator function. Execution will always resume after the yield that was last executed.

Generators are a type of co-routine. One use for generators is for light-weight multithreading. The SimPy discrete simulation package uses generators to maintain the state of objects in the simulation and to advance them in time.

Python 2.5 added the ability to pass values back into generators by passing a parameter to next(). yield is now an expression whose value is the parameter given to next(). See the document What's New in Python 2.5 and PEP342 for details.

Historical note

Iterators and generators were introduced in Python 2.2. In earlier versions of Python, iteration over a sequence was done by calling the __getitem__() method of the sequence with increasing index values. This was a bit of a hack because it required adding __getitem__() methods to objects that were not really subscriptable.

 
© Kent S Johnson Creative Commons License

Short, introductory essays about Python.

kentsjohnson.com

Kent's Korner home

Weblog

Other Essays