Django inheritance

A core principle of Django is Don't Repeat Yourself.

Any time you find yourself writing identical lines of code in a bunch of places you should stop to rethink because there is likely a better way to do it. In general it is a code smell if you are repeating code in Django as the framework has a number of ways to help you write terse, maintainable, code.

While it might initially seem difficult to reduce duplication within your models, we don't need to ignore the DRY principle when using Django's ORM. Django provides a number of helpful tools for reducing repetition in defining models.

A practical example

Lately I have been working on the Footbag Database website. Part of the site is a video catalogue where people can search for information on various different techniques and get relevant results complete with video links.

There were a number of different models such as, DemonstrationVideo and TutorialVideo, etc which essentially just linked a some object with a video. Functionally speaking these models did almost exactly the same thing.

The code initially looked something like this [1]:

class MoveDemonstrationVideo(models.Model):
""" This is to keep track of move demonstration videos. """
    move = models.ForeignKey(Move)
    video_type = models.CharField(max_length=1, choices=VIDEO_TYPES, default=URL_VIDEO_TYPE)
    URL = models.URLField()
    use_start = models.BooleanField(default=False)
    start_time = models.PositiveSmallIntegerField()
    use_end = models.BooleanField(default=False)
    end_time = models.PositiveSmallIntegerField()

class MoveTutorialVideo(models.Model):
""" This is to keep track of move tutorial videos. """
    move = models.ForeignKey(Move)
    video_type = models.CharField(max_length=1, choices=VIDEO_TYPES, default=URL_VIDEO_TYPE)
    URL = models.URLField()
    use_start = models.BooleanField(default=False)
    start_time = models.PositiveSmallIntegerField()
    use_end = models.BooleanField(default=False)
    end_time = models.PositiveSmallIntegerField()

As you can see there's a big chunk of duplicated code here, so is there a better way?

Before I knew about Django's model inheritance features I might have made another CharField field similar to video_type to differentiate between the objects without needing to duplicate the rest of the code. The main downside to this is that we lose the type information which in turn make the code that queries the database less clear. Doing naively like that would mean that the various distinct types would all be stored in the same database table, which would be a bit nasty for querying the database because we would need to load everything from that table when we queried the database then filter through that entry in the rows to get the rows we really wanted.

Sure we could create index tables to speed that up but essentially we end up back at the situation where we had multiple tables again except now with added complexity.

Thankfully Django gives us a cleaner way of handling this via model inheritance.

There's 2 different ways that we can refactor this code which have different implications for what happens at the actual database layer.

Abstract inheritance

Possibly the most conceptually straightforward method is the abstract inheritance. This is very similar to a macro in other languages as It just "copy-and-pastes" the fields from the base class into the child classes.

So now we can write our models like this:

class VideoAsset(models.Model):
"""This is a video asset, specifies the type of the video along with it's location
    and stores some other information about the timestamps for the relevant parts of the video."""
    video_type = models.CharField(max_length=1, choices=VIDEO_TYPES, default=URL_VIDEO_TYPE)
    URL = models.URLField()
    video_id = models.CharField(max_length=20)
    use_start = models.BooleanField(default=False)
    start_time = models.PositiveSmallIntegerField()
    use_end = models.BooleanField(default=False)
    end_time = models.PositiveSmallIntegerField()

    class Meta:
        abstract = True # <--- VideoAsset now abstract


class MoveDemonstrationVideo(models.Model):
""" This is to keep track of move demonstration videos. """
    move = models.ForeignKey(Move)

class MoveTutorialVideo(models.Model):
""" This is to keep track of move tutorial videos. """
    move = models.ForeignKey(Move)

Essentially we have just created Django's equivalent of a mixin and we are using this for creating our models.

This is represented in the database in a way that is essentially identical to the code above. It's just a shortcut for having to type out all the fields and hence reduces the overall maintenance burden. We have to mark VideoAsset as an abstract class in Meta otherwise Django would create database tables for VideoAsset as if it were any other model. When we write abstract = True this tells Django to not create a table for the class.

In this example we have all these classes in the same file but there's no restriction on the location of the code containing the abstract base class. You can put the base class anywhere that you can import python code from.

Multi-table model inheritance

The other way we refactor the code is as follows:

class VideoAsset(models.Model):
"""This is a video asset, specifies the type of the video along with it's location
    and stores some other information about the timestamps for the relevant parts of the video."""
    video_type = models.CharField(max_length=1, choices=VIDEO_TYPES, default=URL_VIDEO_TYPE)
    URL = models.URLField()
    video_id = models.CharField(max_length=20)
    use_start = models.BooleanField(default=False)
    start_time = models.PositiveSmallIntegerField()
    use_end = models.BooleanField(default=False)
    end_time = models.PositiveSmallIntegerField()

class MoveDemonstrationVideo(VideoAsset):
""" This is to keep track of move demonstration videos. """
    move = models.ForeignKey(Move)

class MoveTutorialVideo(VideoAsset):
""" This is to keep track of move tutorial videos. """
    move = models.ForeignKey(Move)

While the idea is similar this produces a different database structure than the other examples. Now MoveDemonstrationVideo and MoveTutorialVideo contain a foreign key that points to a VideoAsset.

These models now only have their own primary key along with a foreign key to the video asset table. When you query for these types the Django ORM is essentially now doing a join on the database to return your Python objects. In this particular project I chose this approach because the video assets code is reused in other unrelated areas without any derived classes. I also needed to be able to simultaneously query all types of objects that contained a video at the same time for moderation/admin purposes which would have required database joins anyway hence nullifying that particular performance benefit. If we never needed to query the base model objects separately we would want to question this approach because of the performance overhead from the "behind-the-scenes" database joins needed to create our derived class Python objects.

[1]You can see the actual source code this example is based on over on GitHub. Here's the file from the commit just before the refactor: models.py And the files afterwards: models.py and video_assets_models.py

blogroll

social