Practical usage of advanced Python language constructs, part 1

I would assume that you are here because you like Python or at least are interested in it. I love Python as the language because of its expression power, friendly learning curve, its coding style and rich functionality. Many of language constructs look like native language or really close, its straightforwardness and general lack of hidden tricks make it easy to start using Python for people of different experience and backgrounds. Not to mention extensive standard library and myriads of external modules for everything. Python seems to be equally appealing as a first language for those who start learning programming or as a new language for experienced programmers. The low entry threshold makes it possible to start writing in Python really easily if you have are experienced programming. Practice shows that you don’t really need to invest much time to start doing useful things.

I used to develop for .net in C# for 5 years before I got into my first Python project. Before that my sole experience with Python was few small scripts. I jumped into Python project without having any more or less formal training or at least reading a book. So my approach with Python was initially exclusively practical, I only cared how to do my task without going too deep into theory of Python the language. On other hand I believe there are people who formally studied Python (in school for example) while not having much real world experience  and can get stuck in theory.

In this series of posts I would like to talk about practical usage of some of Python advanced language constructs. I would like make focus on practical aspects, not just a tutorial stuff. I hope it will be useful for programmers of different levels, those who have more theoretical level can get practical ideas, experienced developers who are new to Python can get ideas how to do familiar things in Python.

I decided to write these posts because the majority of books, tutorials or other posts on those subjects usually discuss examples like how to generate fibonacci numbers or other not very practical cases. All examples that I provide come from either projects I worked on or from popular open source projects, I simplified or changed code to remove irrelevant parts and focus on the subject.

All definitions are taken from official Python glossary.

Part 1: Iteration-related constructs

Definition: iterable – An object capable of returning its members one at a time. Examples of iterables include all sequence types (such as list,str, and tuple) and some non-sequence types like dict, file objects, and objects of any classes you define with an__iter__() or __getitem__() method.

Practically speaking iterable is something that you can use in following places:

  • Loops: for x in iterable:
  • List comprehensions: [for x in iterable]
  • Generator expressions: (for x in iterable)
  • Functions that expect iterable: all, any, sum, filter, map, itertools, etc.

So iterable is an object that has method __iter__ and that’s it. Simple. Alternatively it can implement __getitem__ but this is only good for integer-indexable objects like lists and this is very narrow case while __iter__ method provides general mechanism. So what exactly __iter__ does? It returns an iterator object.

Definition: iterator – an object representing a stream of data. Repeated calls to the iterator’s __next__() method return successive items in the stream. When no more data are available StopIteration exception is raised instead. At this point, the iterator object is exhausted. Iterators are required to have an __iter__() method that returns the iterator object itself so every iterator is also iterable and may be used in most places where other iterables are accepted.

All you need to know for practical usage is that:

  • Iterator remembers state of iteration (for example position in a collection).
  • Iterator is iterable as well.
  • Two only things that iterator does: give next value or raise exception when there are no more values.
  • Iterators are not reusable.
  • Iterator may be infinite.

There are different ways to create an iterator:

  • Write class that implements iterator protocol (__next__)
  • Write generator expression
  • Write generator

Writing your own iterator class is completely impractical. To the point that Python 2.2, the first version which introduced iterators, included alternative method: generators. Python contributors quickly came up to idea that writing iterator classes will become very inconvenient on complex tasks, you can read the discussion in PEP-255. I couldn’t find any good case for doing it, so I will not cover it in this post.

A little bit about generator expressionsGenerator expression is an expression that based on following syntax:

(for expr in iterable if expr)

Looks suspiciously similar to list comprehensions, right? Yes, syntax is actually the same except for brackets ((…) instead of […]). But there is very significant difference between generator expressions and list comprehensions. I would assume that you write list comprehensions often, it is beloved feature of Python. My strong opinion is that generator expressions should be preferred over list comprehensions in many cases. Let me ask you a question first before I present my arguments.

You have some function that takes number and produces number. You want to test what is the maximum value of the function on a given 2 given ranges. Question: what will be the memory consumption in those two variants:

max([f(x) for x in xrange(1000)])
max([f(x) for x in xrange(1000000)])


max(f(x) for x in xrange(1000))
max(f(x) for x in xrange(1000000))

In first case amount of used memory will be proportional to 1000 and 1000000 respectively, O(n), while in second it will not depend on count and will be minimal, O(1) . The reason lies in how those constructs are executed, Python executes list comprehensions by immediately producing a list containing all generated elements, while when executing generator expressions Python iterates nothing and only creates an iterator. The actual generation of values happens only when you iterate over this iterator and those values are not really stored anywhere unless you do it yourself. What does it practically means:

  • List comprehensions consume memory proportional to number of elements.
  • Generator expressions consume only memory to store iterator object.
  • List comprehensions are eager, generator expressions are lazy.
  • You can stop iterating over iterator and just drop it and new values will never be produced (for example function any will stop iterating once it hits True), while list comprehensions will produce all values even if you don’t need them.
  • Generator expression allows only one pass over itself.

Practical conclusion: use generator expressions where you would use list comprehensions, unless you really need to store the whole result set or you want iterate over it multiple times. Examples when you don’t need it: if you wish to filter or map your result set before storing it, pass it to: any, all, sum, min, max, ''.join or just count elements. Here it is a little tricky, because len(iterable) doesn’t work. This is popular question on StackOverflow, so it means that people are struggling with it. Following snippet is the best way to count values produced by iterator:

sum(1 for _ in iterable)

Be careful with infinite iterators, this will crash your program.

Examples from Ansible:

choices = ', '.join(str(i) for i in opt['choices'])
displace = max(len(x) for x in module_list)
if 'DOCUMENTATION' in ( for t in child.targets):

Bad example from WTForms:

def check_ipv4(value):
   parts = value.split('.')
   if len(parts) == 4 and all(x.isdigit() for x in parts):
      numbers = list(int(x) for x in parts)
      return all(num >= 0 and num < 256 for num in numbers)

Why bad? Because there is no need to create list in line 4, just generator expression would work fine.

Now to next practical way of creating your own iterators: generators.

Definitiongenerator – a function which returns an iterator. It looks like a normal function except that it contains yield statements for producing a series of values usable in a for-loop or that can be retrieved one at a time with the next() function. Each yield temporarily suspends processing, remembering the location execution state (including local variables and pending try-statements). When the generator resumes, it picks-up where it left-off (in contrast to functions which start fresh on every invocation).

Ok this seems confusing, and what is more confusing is what could be practical application of this construct. There are many questions about generators and yield statement (which is actually an expression since PEP-342 and Python 2.5).

Simple example, sorry, not really practical:

def generator():
   print('Step 1')
   yield 0
   print('Step 2')
   yield 1

for x in generator():

The presence of keyword yield turns function into generator and completely changes the way function is executed. On line 7 function will not even be called! Actual sequence of executed lines starting from line 8 will be: 8-2-3-9-8-4-5-9. So what happens is that yield works as some sort of return, but on the next iteration function will continue execution from yield until either next yield or return or end of function is reached. If function execution finishes loop exits.

In this way generator shares all properties with generator expression: it is lazy, nothing is actually done until it is iterated, it doesn’t store result set.

For many people this still may be confusing so I will stop theorizing here and jump to examples.

First use case, create a view over other collection, if you want to abstract out filtering and mapping. Simplest example from one of the projects I worked on:

def get_dynamic_fields_names(self):
   for field in self.fields:

If you iterate over this it will go over a list of fields and return names of fields. This can be useful for two reasons: you don’t want to copy-paste this code everywhere you need to loop over those dynamic fields; second is that you could write this as generator expression, but it will not fit in 1 line and multiline generator expressions look ugly.

Next use case, flattening of lists, producing from [[1,2], [3,4]] -> [1, 2, 3, 4]. From Django source. They have some list of lists and they want to iterate over actual elements linearly.

def __iter__(self):
   for node in self.nodelist:
       for subnode in node:
           yield subnode

This is actually not so good example because there is much better way to flatten iterables: itertools.chain.from_iterable, from itertools module. This is also popular question on StackOverflow, so the task is rather common. But itertools will not help you if you have more complex flattening logic, like the authors of Jinja had:

def iter_child_nodes(...):
   for item in self.items:
      if isinstance(item, Node):
         yield item
      elif isinstance(item, list):
         for n in item:
            if isinstance(n, Node):
               yield n

So here they have mixture of single items and lists in their list. If you iterate over this generator you will get the stream of nodes.

Here are two really good related examples from Requests library.

def iter_chunks(self):
   while True:
       chunk =
       if not chunk:
       yield chunk

This code is from the part that parses HTTP responses from server. self.raw is file-like object, probably a socket or some wrapper over socket. It can read a chunk of fixed size from network. If you are processing some large response you might not want to load it into memory completely, but you want to actually process it in chunks as well. So this function makes possible to iterate over chunks as they arrive and save memory.

It can be used to save response directly to file for example. This will work fast and use minimal amount of memory.

f = open(...)
for chunk in self.iter_chunks():

Next example also from Requests. So far we looked at generators that are stateless and don’t carry data between yields. This generator converts stream of chunks into stream of lines. Lines are separated with line delimiters. Since lines can be of arbitrary length, chunk can cut lines and last portion of current chunk can end with unfinished line. This piece of line needs to be preserved until next chunk is received and then joined with the rest of line.

def iter_lines(self):
   pending = None
   for chunk in self.iter_chunks():
       if pending is not None:
           chunk = pending + chunk
       lines = chunk.split(delimiter)
       # Figure out if last line is complete
       # if not - store and wait for next chunk
       for line in lines:
           yield line

   if pending is not None:
       yield pending

Variable pending constitutes a state and its value preserved between function exiting with yield and reentering on next iteration. If there are no more chunks we assume that pending line was finished and yield it as well. In this case as well only minimal amount of memory will be used only to preserve the state.

Another use case for generators is traversal of complex data types. Lets look at the implementation of standard function os.walk. This generator iterates over all files recursively in a given directory, so it can be used as example of traversing a tree structure.

def walk(top):
   # Sort items in current dir into dirs and files
   yield top, dirs, files

   for name in dirs:
       new_path = join(top, name)
       for x in walk(new_path):
           yield x

On every yield it produces a tuple of current path, list of directories in it and list of files. Then it calls walk again recursively for every directory. The state of iterator here is the call stack. Without generators you would have to emulate stack yourself or do other tricks which worsens readability. Later we can just linearly iterate over walk as if directory structure is a flat list:

for path, dirs, files in walk('/path/'):
# process files

I mentioned a couple of times that iterators can be infinite. You might be wondering what are sane use cases for that? itertools provide number of useful infinite iterators, lets look at cycle. If you want to create some alteration in loop in Django templates, it provides cycle tag. For those of you who are unfamiliar with Django templating language, {% for ... %} basically repeatedly copies HTML code within its body.

{% for o in some_list %}

<tr class="{% cycle 'row1' 'row2' %}">

{% endfor %}

This can be used to create alternating background for rows in table. On every loop iteration cycle will produce another value.

And this is how corresponding tag is implemented:

class CycleNode(template.Node):
   def __init__(self, cyclevars):
       self.cycle_iter = 

   def render(self, context):
       return next(self.cycle_iter)

Everytime the tag is called in template Django calls render. It just asks cycle iterator to get next value and iterator keeps in its state which value it gave previously. render will be called as many times as loop was iterated so the fact that there is no end in the iterator is not a problem.

This is it, I hope you found my post useful and it gave you better understanding of mentioned constructs and inspired you how to use it practically. But please don’t be overly enthusiastic and use them only for actual benefit and not just because you want to. I didn’t cover every aspect of iteration-related constructs, so here are some suggestions for further research:

  • Learn itertools, it provides a lot of useful functions for working with iterators. This is great tutorial about this module.
  • I mentioned that yield is actually an expression. Calling code can pass values back to generator with it: x = yield and iterator.send. Here are some links: Python docs, SO question.
  • yield from, docs, SO
  • Coroutines in Python: overview.
  • asyncio (ex Project Tulip)
Practical usage of advanced Python language constructs, part 1