Using LaTeX for fast document generation

Originally posted on JaggedVerge: http://www.jaggedverge.com/2015/12/using-latex-for-fast-document-generation/ (please ask questions or leave comments over there)

Many systems have some sort of report generation component. This is often some variation on extracting data from a database (or other sources) then doing some analysis on that data and outputting it in some readable form. Sometimes a requirement is for reports to be available in PDF format. I use a lot of Python for small tasks and many in-house report generation tasks fall into the category where developer time is much more expensive than processor time. Being able to make these reports quickly AND have the eventual typesetting look good is a big win, even if it’s not the most performant code in the world. This is especially the case if the report is a one-off report.

If the reason for creating PDFs is because it will be printed I find using LaTeX to be especially useful because it handles many of the annoying details of typesetting printed materials. There are a ton of little typesetting things that LaTeX does, for example it deals with excessive rivers in the text, I didn’t even realize it did this automatically because I didn’t notice any of these in the documents it generated. So given that LaTeX does a good job of automated typesetting it seemed like a natural candidate to make PDF files. The only tricky thing is automating the generation and compilation of the LaTeX documents from within code, which is the thing the rest of this tutorial covers.

Example problem

For the sake of this tutorial we look at a fairly common situation: We have some graphs with some along with some descriptive text describing when the data was generated. For the sake of the example the data we wish to plot is generated by the following:

PyLaTeX makes extensive use of context managers to handle the various LaTeX commands. Let’s start with a really simple example of creating a PDF:

def create_data():
    """Example data for JaggedVerge LaTeX tutorial"""
    x_vals = list(range(0,10))
    y_vals = [x**2 for x in x_vals]
    return x_vals, y_vals

Using PyLaTeX

There happens to be a library specifically designed to generate LaTeX from Python called PyLaTeX. For going direct to PDF this library solves a lot of problems. Specifically you don’t need to have an intermediate LaTeX file, you can go direct from Python code to PDF, the benefit of which is that you have fewer steps required in building your PDF.

First we have to set up our python virtual environment. Given that we are using python 3.3+ for this tutorial we can use the virtual environments that the language supports. (If you are using a different version then you have to use the virtualenv wrapper.)

pip install pylatex

PyLaTeX makes extensive use of context managers to handle the various LaTeX commands. Let’s start with a really simple example of creating a PDF:

from pylatex import Document, Section
doc = Document()
with doc.create(Section("Our section title")):
    doc.append("Simple example")
doc.generate_pdf('pylatex_example_output')

That’s ALL the code we need to generate a PDF. Without any more boilerplate to deal with lets get to satisfying the rest of the requirements. Because PyLaTeX has support for TikZ we can create some simple graphs without needing any extra dependencies:

with doc.create(TikZ()):
    plot_options= 'height=10cm, width=10cm, grid=major'
    with doc.create(Axis(options=plot_options)) as plot:
        x_coords, y_coords = create_data()
        coordinates = zip(x_coords, y_coords)
        plot.append(Plot(name="Our data", coordinates=coordinates))

That gets us our plot. Now all we need to do is handle the time stamping and a few miscellaneous document issues. First lets add a title to the document:

doc.preamble.append(Command('title', 'PyLaTeX example'))
doc.preamble.append(Command('author', 'JaggedVerge'))
doc.append(NoEscape(r'\maketitle'))

Note that PyLaTeX is a wrapper around LaTeX code so just like in LaTeX if you miss the \maketitle command the title will not be generated. We can create the time stamp with regular python code:

formatted_timestamp = time.strftime("%a, %d %b %Y %H:%M:%S +0000", data_creation_time)
doc.append("The data in this plot example was created on {}".format(formatted_timestamp))

Once again we can fairly easily get from python to LaTeX whenever we are dealing with text.

At this point we have the following:

from doc_gen import create_data
from pylatex import (
    Axis,
    Command,
    Document,
    Plot,
    Section,
    TikZ,
    NoEscape,
)
import time
data_creation_time = time.gmtime()
doc = Document()
doc.preamble.append(Command('title', 'PyLaTeX example'))
doc.preamble.append(Command('author', 'JaggedVerge'))
doc.append(NoEscape(r'\maketitle'))
with doc.create(Section("Data report")):
    formatted_timestamp = time.strftime("%a, %d %b %Y %H:%M:%S +0000", data_creation_time)
    doc.append("The data in this plot example was created on {}".format(formatted_timestamp))
    with doc.create(TikZ()):
        plot_options= 'height=10cm, width=10cm, grid=major'
        with doc.create(Axis(options=plot_options)) as plot:
            x_coords, y_coords = create_data()
            coordinates = zip(x_coords, y_coords)
            plot.append(Plot(name="Our data", coordinates=coordinates))
doc.generate_pdf('pylatex_example_output')

Which generates the following document:

Rendered document generated by PyLaTeX

Just as with LaTeX it’s probably a good idea to put this plot into a figure environment. If you know LaTeX already it’s fairly straightforward to generate documents using PyLaTeX. In the future I’ll write about manipulating existing LaTeX documents if there is interest.

blogroll

social