Creating an effective Python web development environment

It's exceedingly important to arrange your web development in such a way that makes it as easy as possible over the long term for your projects to succeed.

Why bother?

If you know why you'd spend the effort to set up your project properly skip on to the next section.

If you aren't sold on why you would want to invest the time to set your project up as opposed to just getting out your favourite editor and just hacking away then let me try to explain. Essentially creating a development environment is an exercise in forwards thinking and planning. Planning is a trade off. You are trading some time and effort thinking about a problem in advance in order to make tackling the problem easier to deal with. Programming is no different. By spending a bit of time setting up your project correctly the first time around you can save a ton of time later. Getting the balance correct is quite tricky [1] , planning too much can be detrimental but the vast majority of people do not put in enough effort. This is particularly the case with setting up development environments correctly.

The value in setting things up properly is less tangible than what you see when you have put a page up and you are looking at it from the browser. The abstract nature of the value in planning can lead it to be overlooked. This is because the effort expended in setting your development environment up isn't particularly glorious and the results from doing so aren't immediately obvious and tangible. While you can just throw some pages up quickly and see immediate "results" or feedback, the feedback loop for correctly setting up your project is a lot longer. The value comes at a later date when you mitigate the problems that usually occur in the course of creating software. If you haven't encountered these problems first hand the temptation to just get started quickly is very strong.

Often it's not until much later that the problems associated with ad-hocking your setup become obvious by which point they can be really bad. In fact sometimes you don't even see the problems at all until later on by which time changes are significantly tougher (or even sometimes impossible) to make. Until you have actually run into those problems first hand and seen the amount of effort required to fix systems it can seem like it's a lot of work for not that much gain. Not having run into problems doesn't mean those problems don't exist, it just means you don't know about them.

Say you have to move your site to a different server or a different provider for whatever reason, ask yourself how easy it would be? Do you need to minimize downtime? Do you have customers with data that can't be lost? How will you scale?

Questions like this start to hint at the importance of getting your setup correct.

While some issues are quite complicated and take a lot of effort to get right I'm firmly of the belief that setting up a deployment system is quite manageable and offers extremely good ROI on the time spent. Even if you are just running a personal website that doesn't have any real pressure to be reliable I'm still of the opinion that the ROI is good because it speeds up your development via a reduction in debugging effort. If people are relying on your site then the ROI is unquestionably good. If you have had to fix issues regarding deployments then you probably don't have to be sold on the importance of doing it right. If you haven't encountered these problems then I'm essentially asking you to take my word for it, even though I'd be the first to admit that dealing with the problems first hand is by far the best way for this lesson to be learned. [2]

[1]http://martinfowler.com/bliki/TechnicalDebt.html
[2]The biggest roadblock to people understanding why investing time in doing things right is a lack of understanding of what goes wrong when you don't. So naturally if you have had the experience with things going spectacularly wrong due to things getting out of control and unmanageable then the benefits of properly engineering your system will no longer be this abstract intangible thing.

An approach for Python based web projects

While it's possible to just install your favourite web framework and just start hacking away this mindset will cause issues if you aren't planning on cleaning things up later. One of the main issues is that if you change or update libraries or python itself then you might find things break due to external package changes.

First I always start by getting my version control set up. git init . or hg init . takes almost no time at all and starts paying dividends almost immediately. Backing up the code is now very easy, we just push somewhere. Recovering from mistakes is much easier too. It takes very little time dealing with backed up files in directories to see the overwhelming advantages of using version control software.

One big issue with Python is getting the dependencies on the new system correct and keeping them correct. If you can't replicate your environment you can run into some nasty problems deploying to a new setup. [3] By creating a sandbox for your application you can start to know with confidence what your code is actually depending on. I find it to be very worthwhile when working on any significant Python project to start by creating a virtual environment. This separates the python interpreter and environment from any changes to the operating system by creating an isolated standalone interpreter and separate install. Dependencies are then installed via pip into this isolated install only. Most of the problems that stem from your projects dependencies are either solved or made much more manageable by creating a virtual environment. [4] Using the virtualenv package directly can be a bit of a pain but luckily the virtualenvwrapper package makes the process much easier.

For this reason any time I embark on anything that will be more than a couple of hours worth of work I start by creating a virtual environment for the project using virtualenvwrapper. The setup time is small enough that the ROI is overwhelmingly positive.

[3]One of the nastiest problems is when you depend on a very specific set of packages working together and your package manager updates something which breaks your project. Sometimes it's particularly hard to roll back to the earlier versions. I ran into this problem once with installing a version of gcc on Ubuntu, the gcc version unexpectedly broke the build but the package manager wouldn't let me roll back to a known working version.
[4]There are some other options too, such a setting up VM images or docker containers. However they all work on the same principle, which is that the python packages are completely controlled by the project and are isolated to that same project. This protects your environment from being altered by outside and is what gives your deployments stability.

Separating live and development environments

Having all the changes be made directly on the live site is a very unsatisfactory situation. Mistakes are public and can be catastrophic if done on live. Separating your development environment from the live environment is important because you get a private staging area. That way you can change things as much as you want and you can wait until your changes are sufficiently mature and ready before you put them up to the live sites.

It's great to be able to make changes in a private environment and see if they succeed before revealing them to the world. Being able to have these drafts allows you to see if your content works but further it allows you to have more detailed diagnostics and error messages that would otherwise reveal information about your site setup that could be a security liability.

From a security point of view it makes a lot of sense to have the development site completely separate from the live site and have it not accessible to anyone other than the developers. As much as possible you want the environment that use are using for testing to be the same as the live environment. Having separate installs really requires a virtual environment to work properly, without it the installed packages can get out of sync which is a problem.

I personally like to use Git for this because of the cheap branches and ease in merging but you can use whatever VCS you want. If you don't have a version control system stop reading now and go get one. Many people have already written about the benefits of using version control so I won't go into details, see these links if you want more info [5]. With an interpreted language such as Python I find the best way to seperate live and development is to keep have a dedicated branch that is the live site code. You work on your code changes in the branches and when it is mature enough you merge it back over to the live site branch using your version control software. In any successful approach to branching the contract is that the live site branch is always in a deployable state. When you are using a version control system and extensively making use of branches you really need to have some sort of approach to branching. While I find the git flow approach a bit too heavyweight for smaller projects I'd highly recommend reading this fantastic article on git flow, the diagrams and explanations here really helped me to conceptually understand what was going on. When working with teams I personally like the github flow branching strategy. For smaller projects I use a slightly simpler approach that just involves a live_site branch and a general development branch with feature branches as deemed necessary. I personally deploy projects directly from git branches, more on this in the next section...

Now you have the code in sync you can get environments in sync fairly easily if you have used virtualenvwrapper. You can store the state of the dependencies in the same repository as your code by doing this:

pip freeze > requirements.txt

This command will place all the packages (with their specific versions) that your project depends on in requirements.txt. This way you have the code and the dependencies of that code nicely stored in the same place. This becomes especially useful if your development environment and live environment are on different machines.

[5]http://programmers.stackexchange.com/questions/223027/at-what-point-is-version-control-needed http://stackoverflow.com/questions/1408450/why-should-i-use-version-control https://www.atlassian.com/git/tutorials/what-is-version-control

One step deployments

When I first started writing some web code a few years ago it was a mess of some php files being hosted by an apache server. To make a change I'd do something like this:

  1. Download the .php file to my computer
  2. Change the .php file locally
  3. Open up a FTP client and log in
    1. Upload the relevant files to the live server directly
    2. Reload the pages and hope everything worked
  4. If things didn't work, tinker around with settings if needed and if the change didn't work roll back to the previous version if I could.

The thing that you'll notice with an approach like this is that there's a bunch of separate steps that have to be followed in order to get it to work. Also all the changes were being made directly on the live site, so if anything went wrong then the website might go down.

For now lets say you have already a separate development and live setup (If you don't then see the section above on it), then how do you move the changes you made from development over to live?

Generally speaking the steps go something like this:

  1. make some changes
  2. run your unit test suite
  3. commit your development changes to the version control software
  4. merge the changes back in to the deployment branch
  5. back up any important data that could be impacted from the changes
  6. extract the contents of the deployment branch into the folder that the webserver is serving from
  7. do any database migrations (if necessary)
  8. restart your websever (if necessary)

Most of these steps, while needed, are completely brainless as they don't really require any decision making in order to be executed. However getting any of the steps wrong can lead to problems. As it turns out computers are fantastic at automating tasks such as this and that's the whole idea behind tools like pyInvoke, Fabric and many others. The idea is that if you automate the mundane tasks you get quicker deployments and more reliability from removing a substantial category of possible problems that arise from making small mistakes like typos. The idea goes that the fewer steps required the easier it will be to deploy without issues. Taking this idea through to its logical conclusions you'll see that a one step deployment is often ideal.

Say you spot a bug and you know it will take 2 minutes to fix the code but 30minutes to make all the changes (assuming nothing goes wrong) then all of a sudden the minimum amount of time for a bugfix is now over 30 minutes. This can have quite a discouraging effect. Getting the overall time it takes to deploy reduced to the minimum makes it much easier to embark on making those small-but-necessary fixes.

Even if you can't get your project deployed via one command it's still important to look to reduce the number of steps to deploy or build your project. Good software engineering is all about reducing the number of things that can go wrong. There's a whole host of problems with having too many steps between making changes and being able to apply those changes. If the process gets too difficult then it can start to discourage you from making changes, even necessary changes. All of a sudden you might find a bug that you want to fix and you really want it to be as easy as possible to push that fix up. There's always a number of small but necessary tasks so you want to make it take as small an amount of time as necessary to get those tasks done.

I don't think I fully appreciated the value in automating a deployment until after I'd got some experience with Fabric and seen how smoothly a routine update to my site could go. Because the smaller chunks of code were easily integrated I found myself completing more bugfixes. While it took a bit of time to get Fabric to deploy directly from a git repository I just don't think I could go back to the productivity hit of using zip files with the copy-and-paste approach.

Deploying with pyInvoke

To use pyInvoke to deploy your site you need to set up a tasks.py file (see the pyInvoke documentation). The beauty of it is that it's actually just Python, it just gives you really nice ways of accessing all the most commonly used things when deploying a web site. Because it's Python you can do anything you want for automating your deployment that you can code in Python.

A deployment tasks.py file might look something like the one from the footbag-db project:

"""Uses pyinvoke to do common tasks"""

from invoke import task, run

@task
def compile_scss():
    """Compile the SCSS and copy to relevant static directory"""
    run("mkdir -p ./static/basic_theme/css/")
    run("mkdir -p ./footbag_site/static/basic_theme/css/")
    run("pyscss ./scss/*.scss > ./static/basic_theme/css/style.css")
    run("cp ./static/basic_theme/css/style.css ./footbag_site/static/basic_theme/css/style.css")

@task
def run_tests():
    """Run the unit testing suite"""
    run("python manage.py test")

@task
def restart_server():
    """The command to restart the web server.
    PythonAnywhere is set up such that touching the WSGI file restarts the server.
    Change this command to whatever the web server requires."""
    run('touch /var/www/www_footbag_info_wsgi.py')#restarts PythonAnywhere server


@task(run_tests)
def stage_changes(branch):
    """
    Prepare to stage changes on the staging server from a git branch if and only
    if the unit tests pass.
    The syntax to call this from the command line is:
    >>> invoke stage_changes --branch=foo_branch
    where branch_name is the name of the branch being deployed
    """
    run('git checkout develop && git merge --no-ff ' + branch)
    run('python manage.py migrate footbagmoves')

@task(run_tests)
def prepare_deployment():
    """
    Prepare to deploy from develop to master and only if the unit tests pass
    The syntax to call this from the command line is:
    >>> invoke prepare_deployment
    """
    run('git checkout master && git merge develop')

@task(post=[restart_server])
def deploy_to_live():
    """
    Deploy from the dev folder to the live site using git, assumes changes have
    already been prepared with prepare_deployment.
    """
    import os
    pwd = os.getcwd()
    os.chdir('/home/janis/footbagsite/www_footbag_info/')
    compile_scss()
    run('git checkout master')
    run('git pull /home/janis/footbagsite/dev-site/ master')
    run('python manage.py migrate footbagmoves')
    os.chdir(pwd)

@task
def create_search_index():
    """Create the index files needed to run Haystack search"""
run('python manage.py update_index')

Essentially this automates a lot of mundane things like building the CSS from SCSS files and migrating. These are time consuming and error prone steps, by automating this you save enormous amounts of time and frustration.

2016 EDIT: Most of the things suggested in this article are simple techniques that increase productivity. I wrote a more general article about how techniques impact developer productivity that deals with the more general topic.

blogroll

social