Forwards compatibility with Python generators

The other day I was getting a variety of issues fixed in the Persephone library, one such issue was setting up the CI environment to actually test with Python 3.7 in the test matrix. This turned out to be much more annoying than I estimated, so much so I wrote a post about it. The reason for setting up these tests was that the codebase didn't support Python 3.7 due to a section of code that raised StopIteration directly to signal there was nothing to generate. So having the CI working for Python 3.7 was something I wanted before I started changing the codebase and advertising Python 3.7 support.

First a bit of history

Generators were introduced into Python version 2.2 with PEP 255 back in 2001. This allows you to yield elements out one at a time, which in some cases can be very beneficial. Internally a StopIteration is raised to handle the exhaustion of a generator, when no more items are to be yielded. From the beginning a generator has been able to have a return statement too, you can see this in the original PEP document here but some people (such as when I wrote a quick fix for an issue in Persephone) were raising StopIteration directly from their code to end a generator.

There's some issues with raising a StopIteration from your code (as opposed to the internals of Python) so PEP 479 made this deprecated in Python 3.5 and this is now an error in Python 3.7 onwards. The PEP itself has a great example of the potential issues in the rationale section

A practical example

So lets have a look at what this does in a real project. Persephone has some code that creates batches to be sent to the machine learning portion of the code that used to look like this:

    def train_batch_gen(self) -> Iterator:
        """ Returns a generator that outputs batches in the training data."""
        if len(self.train_fns) == 0:
            raise PersephoneException("""No training data available; cannot
                                       generate training batches.""")
        # Create batches of batch_size and shuffle them.
        fn_batches = self.make_batches(self.train_fns)
        if self.rand:
            random.shuffle(fn_batches)
        for fn_batch in fn_batches:
            logger.debug("Batch of training filenames: %s",
                          pprint.pformat(fn_batch))
            yield self.load_batch(fn_batch)

There was an issue however if fn_batches was empty which caused some issues at calls sites and led to this change:

    def train_batch_gen(self) -> Iterator:
        """ Returns a generator that outputs batches in the training data."""
        if len(self.train_fns) == 0:
            raise PersephoneException("""No training data available; cannot
                                       generate training batches.""")
        # Create batches of batch_size and shuffle them.
        fn_batches = self.make_batches(self.train_fns)
        if self.rand:
            random.shuffle(fn_batches)
        for fn_batch in fn_batches:
            logger.debug("Batch of training filenames: %s",
                          pprint.pformat(fn_batch))
            yield self.load_batch(fn_batch)
        else:
            raise StopIteration

This fixed the issue that being encountered but at the cost being breaking on Python 3.7, where this code causes the following issue:

                    batch_gen = self.corpus_reader.train_batch_gen()

                    train_ler_total = 0
                    print("\tBatch...", end="")
>                   for batch_i, batch in enumerate(batch_gen):
E                   RuntimeError: generator raised StopIteration
persephone/model.py:382: RuntimeError

To get this into the more modern format that's compatible with Python 3.7+ we have to change it to this:

    def train_batch_gen(self) -> Iterator:
        """ Returns a generator that outputs batches in the training data."""
        if len(self.train_fns) == 0:
            raise PersephoneException("""No training data available; cannot
                                       generate training batches.""")
        # Create batches of batch_size and shuffle them.
        fn_batches = self.make_batches(self.train_fns)
        if self.rand:
            random.shuffle(fn_batches)
        for fn_batch in fn_batches:
            logger.debug("Batch of training filenames: %s",
                          pprint.pformat(fn_batch))
            yield self.load_batch(fn_batch)
        else:
            # Python 3.7 compatible way to mark generator as exhausted
            return

However we do want to continue to maintain support for Python 3.5 so this change would need to be accompanied by adding this future statement at the top of the file:

from __future__ import generator_stop

What this does is make the Python 3.5 behavior consistent with the newer 3.7 way of handling things.

Published: Wed 03 July 2019

By Janis Lesinskis

In Software-engineering

Tags: Python generators tutorial software-engineering