List Comprehensions for Fun and Profit

List comprehensions are awesome. They’re an incredibly intuitive and readable way to create, filter, join and otherwise modify lists. In the interest of helping young pyrates understand how they can use this simple and elegant tool to their advantage this tutorial will start with the very basics of what list comprehension are, and then gradually scale up to some of their cooler and more esoteric applications. If you’re already well acquainted with the concept, skip down a few sections for the more exciting stuff, or jump right to the end for the really juicy bits.

A list comprehension is a concise way of generating a list from an existing set or another iterable object. The syntax for the one of the most basic list comprehensions is as follows: animal_names = [animal.name for animal in list_of_animals]

The idea, as I’m sure is evident or familiar to many of you, is that the new list will be the set of items as specified at the beginning of the list comprehension (before the ‘for’ clause), extracted from the list at the end of the comprehension (after the ‘in’ clause). The item in the middle, our ‘animal’ variable in the example above, is the variable that will represent each item in the list being processed, is scoped to the comprehension, and may be acted upon the same as any other object. The example above extracts a list of animal names from an original list of animals and could also be written as:

animal_names = []
for animal in list_of_animals:
  animal_names.append(animal.name)

Another popular example to introduce list comprehensions is the list of perfect squares of the numbers in a range. Easy enough to compute with a few lines of code but with list comprehensions we can do it in one and, I’d argue, make the statement even more readable:

squares = [x*x for x in range(15)]
squares
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196]

Predicates and Generator Expressions

The first way we’re going to make list comprehensions more useful is by adding an if-statement to dictate the inclusion of terms extracted from the source list in the resulting list. This if-statement, known as the predicate, is added at the end of the comprehension, and maintains the almost english-like readable syntax. To continue our mathematical examples (list comprehensions in fact have their roots in traditional mathematic expressions), what follows is the set of numbers between zero and one hundred that are divisible by seven:

div_by_seven = [x for x in range(100) if x%7 == 0]
div_by_seven
[0, 7, 14, 21, 28, 35, 42, 49, 56, 63, 70, 77, 84, 91, 98]

This allows us to easily filter out list items that are undesirable. (Note that the above example can also be accomplished with range(0, 100, 7). This example is illustrative only.)

I’d also like to touch on the fact that the value at the start of the list comprehension, the value that will be included in your final list, can be the result of an expression as well as a static value. Much like lambda functions there can only be a single expression at the start of your comprehension, which restricts you from writing large blocks of logic within the comprehension. I’m personally in favor of this limitation, as it enforces the terse and readable structure of the comprehension and prevents wild misuse. This functionality allows us to take advantage of other bits of python’s syntactic sugar such as ternary operators:

moods = ['sad' if person.is_alone() else 'happy' 
  for person in people_at_party]

For those of you unsure of what exactly expressions and statements are in python a brief (and only mostly accurate) definition would be that all expressions evaluate to a value, whereas statements are a superset which describe pretty much any piece of python code and therefor include the set of expressions. For quick reference I use the ‘does it look like it belongs on the righthand-side of an assignment’ check to evaluate whether a statement is an expression and therefor can be used in a comprehension. A few of the most common examples of these are variables, ternary operators, and function calls. For a better understanding of this concept I recommend checking out these links.

Nested and Chained Comprehensions

List comprehensions are very flexible, and can be extended in a number of ways. The first way, and perhaps the simplest, is by using nested comprehensions. Nested comprehensions are a single-line equivalent of nested for-loops (as you may have been able to infer), and as such extend the functionality we’ve established in exactly the way you’d expect. A neat little nested comprehension can be used to generate the identity matrix that will be familiar to anyone who’s dabbled in matrix algebra. For those who haven’t, this produces a 4×4 matrix of ones and zeroes such that the ones form a diagonal line from the top left to the bottom right:

ident_matrix = [[1 if col_index==row_index else 0 for row_index 
  in range(4)] for col_index in range(4)]
ident_matrix
[[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0], [0, 0, 0, 1]]

You’ll note that the value generators in these two comprehensions are slightly more complex than what we’ve seen previously. For the inner expression we take advantage of python’s ternary syntax, while for the outer comprehension its generator expression is the nested comprehension, which is what gives us the list-of-lists structure that emulates the matrix.

While by nesting list comprehensions we can easily build lists of lists, we can deconstruct the same by using chained comprehensions. Chained list comprehensions similarly iterate over multiple lists just like nested for loops, but produce a single array instead of a multidimensional one. I recommend using chained comprehensions sparingly, as their syntax is somewhat less intuitive and they can quickly become un-pythonic in their complexity. Given that matrix we generated above, here’s a quick example of how we could flatten it back out:

flattened = [num for row in ident_matrix for num in row]
flattened
[1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1]

In more traditional syntax, this could be written:

flattened = []
for row in ident_matrix:
  for num in row:
    flattened.append(num)

To make this a little more complex, let’s flatten the identity matrix and get only the zeros, ignoring the ones entirely.

flat_zeroes = [num for row in ident_matrix for num in row if num==0]
flat_zeroes
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

Or how about only the zeroes from the even numbered rows?

even_zeroes = [num for row in ident_matrix if 
  ident_matrix.index(row)%2==0 for num in row if num==0]
even_zeroes
[0, 0, 0, 0, 0, 0]

You can see how the awkward alternations between our assignments and our predicates quickly make our list comprehension much harder to read and follow. This is where list comprehensions begin to approach the limit of their usefulness. As much as it’s nice to have everything on one line, list comprehensions are meant as a convenience tool; if they’re making your code more complex and unreadable, they should be replaced by a more traditional structure. If your code looks indecipherable and fits on one line but could easily be replaced by a few more lines that could be written by a coder with only minimal experience with python, you’re trying too hard to impress someone.

# Perhaps overly simplified, but straightforward and unambiguous.
even_zeroes = []
for row in ident_matrix[::2]:
  for val in row:
    even_zeroes.append(val) if val == 0 else None

Dict and Set Comprehensions

Python 2.7 added dictionary comprehensions, a long sought after feature. Sadly, as much of my work is still done on projects that rely on python 2.4-6, these are not available to me. That being said, by cleverly combining list comprehensions and python’s built in dict() function we can create our own simple dictionary comprehensions in a pinch. The dict() function can construct a dictionary from a number of different inputs, but the one that’ll be the most useful to us is the list of tuples. Calling the following:

dict([('cat', 'meow'), ('dog', 'woof')])
{'dog': 'woof', 'cat': 'meow'}

Gives us a dictionary that maps the keys ‘cat’ and ‘dog’ to their respective animal noises. As I’m sure many of you can see this allows us to dynamically create such a list by using a similar expression within a comprehension like this:

inverted_dict = dict([(val, key) for key, value in 
   old_dict.items()])

Python 2.7+ dict comprehensions are semantically equivalent, but more intuitive and less expensive due to not having to create an intermediate list to pass to the dict constructor. To do the same as we did above in later versions of python, we can use:

inverted_dict = {val : key for key, val in old_dict.items()}

Much simpler, and much more pythonic.

Set comprehensions are also a new addition as of python 2.7, which rather than generating a list or dict, they create a set (only contain unique items). The syntax is similar to dict comprehensions (in that it uses the ‘{‘ symbol), but instead of giving it a pair of items, we provide just one. This can be emulated in python 2.6- by calling set() on the generated list.

unique_names = {name for name in all_names} 
# Is equivalent to
unique_names = set(all_names)

Generator Objects

A slight variation to list comprehensions is generator objects which are syntactically and functionally nearly identical, but rather than generating a list when the generator object is instantiated they pull from the source list when the generator object is iterated upon. This removes the need for intermediate storage and provides a not insignificant performance gain in many cases. Additionally, if the list being generated from is altered after the instantiation of the generator object but before iterating over the generator’s product then the change will be reflected in the resulting list. Finally, because we instantiate a generator instead of a list we can’t access the list members like we would be able to usually; instead the object is effectively opaque until it is iterated over and then we have access to only a single member at a time. Because this is an easy concept to get tripped up on I’m going to do my best to illustrate it below. Please note that the generator is instantiated by using ‘(…)’ instead of ‘[…]’ notation.

source = [1, 2] #list from which we'll generate 
comp_list = [num + 1 for num in source] #Stores complete result
gen_obj = (num + 1 for num in source) #Stores procedure for generating list

comp_list[0] #Regular list access
2
gen_obj[0] #Doesn't work
Traceback (most recent call last):
  File 'stdin', line 1, in module
TypeError: 'generator' object is unsubscriptable

source.append(3) #alter list

# Comprehension yields the product of unmodified list
for num in comp_list:
  print num
2
3

# Generator yields product of augmented list
for num in gen_obj:
  print num
2
3
4

Obviously, as with all things in life and coding, using list comprehensions vs generator objects is a tradeoff, in this case between flexibility and performance. It’s up to you to apply them judiciously.

Esoterica (Or, The Good, The Bad, and The Ugly)

Let’s start with the good:

almost_csv = [line.split('::') for line in open('my_report.csv')]

I’ve seen mixed opinions on this comprehension but I’ve done enough not-comma-delimited file handling to find this quick and dirty solution useful. Because files are closed when the corresponding file object is deallocated and the file object here is scoped to just the list comprehension, this neatly opens a file, splits it by lines, and then splits lines on ‘,’ before closing the file again. All in the space of just a few characters. Admittedly it means you’re loading the entire file into memory, but hey, sometimes that’s okay. Alternately, if you used a generator instead, you’d get the same concise syntax with non of the memory overhead! Please note that for any case where your delimiter is a single character, the csv module should be used.

Here are a couple more neat little perks, annotated with comments as needed:

# Tuple unpacking works in list comprehensions!
unpacked = ['%s + %s' % (a, b) for a, b in enumerate(range(100, 0, -1))]

# Comprehensions play well with python's built in functions
sorted_owners = sorted(getOwner(dog) for dog in dogs) 
# Note that the above creates a generator object and shares parenthesis with the sorted function

As for the bad, I’m going to unashamedly steal the form of this one from Chris Leary‘s blog post which you should definitely read if you want a deeper and more entertaining analysis of why you shouldn’t try this:

ord_lists = [2,3,4,5,3,4,23,24,25,26,45,46,9,13,14]
sub_seqs = []
[sub_seqs[-1].append(x) if sub_seqs and x == sub_seqs[-1][-1] 
  + 1 else sub_seqs.append([x]) for x in ord_lists]
[None, None, None, None, None, None, None, None, None, None, None, None, None, None, None]
sub_seqs
[[2, 3, 4, 5], [3, 4], [23, 24, 25, 26], [45, 46], [9], [13, 14]]

While this works, and perhaps gets some points for being concise, it’s going strictly against the designated purpose of list comprehensions. List comprehensions are for list construction, not shorthand for loops. All this is doing is saving the developer the inconvenience of writing the proper loop syntax. Because append() does not return a value we’re generating a garbage list of Nones the length of the list that’s being modified; and more importantly any developer who reads this will be forced to re-comprehend the meaning of the code after realizing that the safe and familiar comprehensions that they’re used to are being bent to a completely foreign purpose. List comprehensions are nice, be nice to them too.

The ugly award goes to a list comprehension I saw that used a neat trick that I suppose could be super useful in certain situations, but if you find yourself using it you better have a pretty excellent argument for your case. Python lets you access a dictionary of all local variables, even the unnamed ones, and list comprehensions are assigned the designator ‘_[1]’. This gives us the pretty powerful ability to access and pass to functions the generated list as it’s being generated. For example, here are two comprehensions, the first of which uses a isPrime function that we define to generate the list of prime numbers from zero to fifty, and the latter of which is a concise way of getting the first N numbers in the Fibonacci sequence:

def isPrime(num, primes):
  for prime in primes:
    if prime == 0 or prime == 1:
      pass
    elif num % prime == 0:
      return False
  return True

[num for num in range(50) if isPrime(num, locals()['_[1]'])]
[0, 1, 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47]

# Fibonacci Sequence
[locals()['_[1]'][-1] + locals()['_[1]'][-2] if 
  len(locals()['_[1]']) > 1 else 1 for i in range(15)]
[1, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377]

However once again, this is pretty unreadable, and relies on an undocumented feature that has no guarantee of consistency or maintenance. Watch out.

Edits

There were a couple of mistakes and a totally untested (and wrong) example. Thank you to all who pointed it out, and I’ll try to continually improve this tutorial as I learn more.

Blog of Brynne

Trying to learn from my mistakes