http://memojo.com/~sgala/blog/49.atom ../images/favicon.ico Boxes and Glue Santiago Gala and his Symbols Santiago Gala sgala@apache.org http://memojo.com/~sgala/blog/ 2010-07-25T18:31:26+02:00 tag:memojo.com,2004:49 Iterator protocol change in python 3k

I got a bit angry when I noticed that the iterator protocol has been changed for python 3k. I didn’t like it at first, and I still think that it will be the worst compatibility nit of the whole transition, but at least I found a nice positive side of it.

The python iterator protocol used to be simple: an object is iterable if iter(object) returns an iterator. An iterator has an __iter__ method, which returns self, and a next method, which returns the next value or raises StopIteration.

What is the change? In python 3.0 the next method will be renamed to __next__, and there will be a new builtin called next(iterable,default) that calls __next__ and returns default or raises StopIteration, if default is missing.

While I was testing python3k, and I was getting errors because of the change, I was getting more and more angry, it looked quite gratuitous. It was on the context of /me trying to understand the concept of a generator, and exploring a series of variations on syntax, as well as old articles. One of the papers, General ways to traverse collections, has a nice section on generators on Icon, Python and scheme. I was trying to reproduce the first Icon example:

sentence := "Store it in the neighboring harbor"
if (i := find("or", sentence)) > 5 then write(i)

Is is supposed to print 22, i.e, the first value of i that matches. I couldn’t find a good way to do this in python until I noticed the new builtin. Then it was obvious (still a bit less readable as regular expressions have no syntax in python):

>>> import re
>>> sentence = "Store it in the neighboring harbor"
>>> next( (i.start() for i in re.finditer("or",sentence) if i.start()>5) )
22
>>> next( (i.start() for i in re.finditer("or",sentence) if i.start()>32),-1)
-1

We can even use the default value to avoid the exception and return a sentinel or an empty value. With this addition, python gets one idion I was missing: get me the first value that fills a condition (expr(i) for i in iterable if condition(i)) is a filter. On it:

  • all will tell me if all the expr(i) are true (filtered by condition(i))
  • any will tell me if there is at least one true condition
  • list will build a list of the results (set and dict versions too)
  • next will return the first one, and evaluate only those needed to find it. Either fails with StopIteration or with a value passed as second argument to next

Borrowing the generator example from the wikipedia , slightly changed to avoid the list there:

>>> def primes():
...     n = 2
...     p = []
...     while True:
...         if not any( n % f == 0 for f in p ):
...             yield n
...             p.append( n )
...         n += 1
... 
>>> next(i for i in primes() if i>10000)
10007

will return the first prime greater than 10000. Dealing with infinite generators is a bit tricky, but at least now python will have a nice, readable idiom for just the first.

2008-01-11T21:54:05+01:00
tag:memojo.com,2004:49-1205171051 kbob@jogger-egg.com 216-210-236-194.atgi.net form Bob Miller Iterator protocol change in python 3k

In Python 2, you can write the last line as

  (i for i in primes() if i > 10000).next()

2008-03-10T19:44:11+01:00