Map/Reduce: Map and mind twisting in programming
One extremely useful operation is to apply some transformation to each element in a list and generate the list of results. (...) We can abstract this general idea and capture it as a common pattern expressed as a higher-order procedure (...). The higher-order procedure here is called map. Map takes as arguments a procedure of one argument and a list, and returns a list of the results produced by applying the procedure to each element in the list. (...)
They illustrate the concept with a function scale_list that multiplies all elements of a list by a scale factor.
I transpose the code to python:
def scale_list(items, factor): if not items: return [] return [items[0]*factor] + scale_list(items[1:],factor)
This makes me think about how to manage lists or arrays, and the scale concept is lost completely. Still, this is too lispish, a literal translation from scheme to python. For a cleaner definition, we can use for instead of the list construction and recursive style in scheme:
def scale_list(items, factor):
result = []
for item in items:
result.append(item*factor)
return result
But we are still thinking in terms of traversing a sequence, and what we do for each element. We can do much better.
Let us call the concept doing something with all the elements of a sequence and collecting the results and call it map. Our solution would look something like:
def scale_list(items, factor):
def scale(n):
return n*factor
return map(scale, items)
This example defines a private function inside scale_list, and then uses it as an argument to map. This is done to illustrate that the operation mapped to the collection can be arbitrarily complex, not just a simple operator. In a somehow object oriented way (_mul_ is how python exposed the method called for multiplication), we could use:
def scale_list(items, factor): return map(factor.__mul__, items)
There is a mind twist in both cases: we are no longer thinking about how to traverse a sequence doing something, we have somehow extended the concept of “scaling” from numbers to collections of numbers.
The concept of mapping a sequence gets completely exposed, inside out, in the list comprehension version:
def scale_list(items, factor): return [i*factor for i in items]
we could have used the scale definition and said [scale(i) for i in items]. List comprehensions, and their cousin generator expressions, are naked map calls. They are much like SQL statements SELECT i*factor AS scaled_item FROM items; in fact, a SQL SELECT is a mapping operation that turns a table into a different one.
Going back to Abelson and Sussman:
The difference between the two definitions is not that the computer is performing a different process (it isn’t) but that we think about the process differently.
Coda: What are generator expressions? They are delayed (lazy) lists. In python they are called generators. I have already talked about generators, in the context of webapp state and will talk again for this series trying to explain more about how map/reduce is typically used for “web scale” tasks like building Google index.

Add your comment