Janis Lesinskis' Blog

Assorted ramblings

  • All entries
  • About me
  • Projects
  • Economics
  • Misc
  • Software-engineering
  • Sports

Why I now hesitate to use TeX


Despite a lot of typesetting benefits now, in 2019, I find I don't use any of the TeX derivatives anywhere near as much as I used to and I've developed an aversion to introducing LaTeX into projects. This wasn't always the case, I wrote a whole book and a variety of technical documentation in LaTeX, after the initial learning curve ended I didn't hesitate to use LaTeX, for typesetting it was one of the first things I reached for. But things have changed. And this isn't a case of my work shifting to areas with less document generation and typesetting, I'm very heavily involved in consulting on automation of document generation and reporting at the moment. I find that I'm still using LaTeX but just not as much as I used to and not in the same ways as before, and it seemed worthwhile trying to figure out why.

This post came about because I had an especially frustrating couple of weeks trying to get a few different projects with a significant LaTeX component up and running. I want to make it clear that I do like a variety of things about LaTeX, enough so that I've wrote articles about it and made contributions to projects such as KaTeX. I just needed to cathartically dump down some text since I was badly annoyed the other day. But instead of just posting a rant I've let a few days pass to get cooled off and to try and post something more useful for the readers and also do some analysis that would be useful for myself.

The frustrations came from the fact that these days I'm especially busy, between running a high end tech consultancy business and running a lot of on-site training workshops I'm exceedingly time-poor, so the time consuming nature of debugging TeX issues was just not something I wanted to deal with. The cost of the extra time needed to deal with the various pain points of LaTeX is much heavier for me than it used to be. But it's not just a simple matter of the overall amount of time required to make LaTeX work, it's where that time comes up that's been the main factor. With other types of software you can have situations where you can front load a lot of the effort in getting something to work later. A classic example of this is automation, in a good automation project the initial time outlay pays off substantially from reducing the time you need to spend on the system for each unit of work in the future. (There are situations where automation actually increases the time spent on a system due to increased efficiencies see Jevon's paradox for some examples of this.) Containers are a great example of this especially with regards to build automation, you can spend a lot of time getting things set up so that later on you have a much easier time when deploying. Interestingly LaTeX very strongly doesn't have this feel to it, since many time consuming issues only surface at the time you try to build the documents, missing fonts perhaps being the key example. This topic of when and why to front load effort in software is a very big topic, so I'll save that for a future post.

This is a classic example from a project recently:

$ pdflatex document.tex 
This is pdfTeX, Version 3.14159265-2.6-1.40.18 (TeX Live 2017/Debian) (preloaded format=pdflatex)
 restricted \write18 enabled.
entering extended mode
(./document.tex
LaTeX2e <2017-04-15>
Babel <3.18> and hyphenation patterns for 7 language(s) loaded.
(/usr/share/texlive/texmf-dist/tex/latex/base/article.cls
Document Class: article 2014/09/29 v1.4h Standard LaTeX document class
(/usr/share/texlive/texmf-dist/tex/latex/base/size10.clo))
(/usr/share/texlive/texmf-dist/tex/latex/wallpaper/wallpaper.sty
(/usr/share/texlive/texmf-dist/tex/latex/base/ifthen.sty)
(/usr/share/texlive/texmf-dist/tex/latex/tools/calc.sty)
(/usr/share/texlive/texmf-dist/tex/latex/eso-pic/eso-pic.sty
(/usr/share/texlive/texmf-dist/tex/generic/oberdiek/atbegshi.sty
(/usr/share/texlive/texmf-dist/tex/generic/oberdiek/infwarerr.sty)
(/usr/share/texlive/texmf-dist/tex/generic/oberdiek/ltxcmds.sty)
(/usr/share/texlive/texmf-dist/tex/generic/oberdiek/ifpdf.sty))
(/usr/share/texlive/texmf-dist/tex/latex/graphics/keyval.sty)
(/usr/share/texlive/texmf-dist/tex/latex/xcolor/xcolor.sty
(/usr/share/texlive/texmf-dist/tex/latex/graphics-cfg/color.cfg)
(/usr/share/texlive/texmf-dist/tex/latex/graphics-def/pdftex.def)))
(/usr/share/texlive/texmf-dist/tex/latex/graphics/graphicx.sty
(/usr/share/texlive/texmf-dist/tex/latex/graphics/graphics.sty
(/usr/share/texlive/texmf-dist/tex/latex/graphics/trig.sty)
(/usr/share/texlive/texmf-dist/tex/latex/graphics-cfg/graphics.cfg))))

! LaTeX Error: File `roboto.sty' not found.

Type X to quit or <RETURN> to proceed,
or enter new name. (Default extension: sty)

Enter file name: 

So there's a few things that are annoying here on a technical level but the main impact is that it's hard to automated the processes that will allow you to share LaTeX projects between people on different machines. This was due to fonts being installed by default on one operating system but not the other. The remediation of this issue was very time consuming, one thing you'll notice that if the build can't find a font it opens up this prompt for providing a file path. While in a single user environment this UI might be OK this design decision makes automation harder. The interaction with a prompt that will pause the system waiting for user input is harder to automate but perhaps the bigger issue is the design encourages you to mutate the state of the underlying system dynamically. The fonts pain never quite seems to go away, but the point is more that the automation of the environment is hard to do, the pain from it in a document pipeline setting is in large part because this automation is hard to do. As far as I understand unless you explicitly define the path to search for fonts AND distribute the font files with the documents you wish to build that you will have the potential to run into these missing font issues.

Say you want to take a crude approach the standard advice might be to install more font packs, such as doing:

apt-get install texlive-fonts-extra

Which comes in at a rather heavy 400+ meg download with a bit of a lengthy install. Now that may be fine in a lot of places in the world, however there's locations without particularly good internet, like a lot of Australia, where this step can be a substantial issue.

But even if the internet is faster than the typical Australian internet, like for example the rural shack in Canada that had a wood fire for heat where I lived for a while (which is still faster than my urban connection in Australia even 7 years later), you still may run into issues with the upstream packages changing what fonts are included. The reproducibility here depends on what happens in those upstream packages. If you need this to be rock solid you have to answer the question along the lines of "how could I ensure I could build my documents even if I'm in Aeroplane mode with no internet connection?". If this really matters you could try to pin the dependency or vendor it, but then you might as well just distribute the font files you need directly in that case as opposed to depending on a font collection package.

A lot of metadata information is required to build a LaTeX document in a replicable way, and much of this is not explicitly recorded. In particular you have multiple dependencies on the system state to get the output to be exactly the same across different builds. As time goes on I find this less and less appealing, a classic example of it came up when we were working on document that built fine on a MacBook but was failing on the Ubuntu machines.

There's no doubt that if typesetting and typography matter for your project that having this level of control over the fonts is very beneficial. For example last week we were creating some booklets for Python training materials with Python Charmers and being able to get the perfect control over the typesetting was a large concern (incidentally we find printing out high quality materials has really helped the workshops bring more learning value, the details really do matter). Making all the code blocks and figures be laid out perfectly is the sort of attention to detail that really matters in that situation. (And honestly in a typography setting "it's almost right, but just looks a little off" is lousy option)

I think in general use however these pain points around the distribution and package management with LaTeX are relatively speaking quite annoying now. But relative to what? I think the rest of the document ecosystem is worth comparing with since these are the alternatives you have for typesetting and publishing tasks.

Ecosystem changes in the last decade

If we go back 10-15 years it was a major pain to typeset a whole variety of things, that old equation editor was memorable for how terrible it was, not even Clippy could help you there (even Merlin and Rover couldn't help, it was that bad). Overall great strides have been made in document generation in the last decade. Overall the pain level was just far higher across the board (I remember huge amounts of pain setting up printers with CUPS on one of the first projects I used LaTeX on, but the most recent printer I dealt with worked without any issues at all).

In the last decade or so LaTeX and the various other distributions of TeX have got better for sure. People have been trying to tackle Unicode deficiencies that have traditionally caused Unicode and UTF8 to not work at all well. The TeX package manager CTAN has been maintained and updated during this time as well. In some senses it is just a case of it being tough being first, TeX has a goal of maintaining support for rendering existing documents. There's a lot of good things about this stability but it comes with its own set of tradeoffs when dealing with newer developments. LaTeX was released in 1983, and Unicode v1 was released in 1991 so unsurprisingly character encodings such as UTF8 were not implemented in the original versions of LaTeX. UTF8 really became popular as the encoding of the internet, but keep in mind TeX and LaTeX both predate the internet. Maintaining both backwards compatible with older documents and taking trying to accommodate for newer technologies that came out over a decade after the release of LaTeX isn't easy and I think overall they have done a good job considering those constraints.

Thinking about it more it seems a large part of the annoyance I currently have with LaTeX (relative to ten years ago) is due to other things improving more over the same time period. I find the comparison with other build systems and package management solutions really interesting, now in 2019 there are far more good options for package management. Python for example was hideous for package management 10-15 years ago (pip didn't exist till 2011 and took a few years to really get better), other package management systems like Homebrew didn't even exist (Homebrew was released a little over 10 years ago in May 2009) and npm likewise didn't exist (npm was released in January of 2010). There's probably a whole post there about how package management has improved over time, but I won't digress.

Perhaps one of the biggest lessons from package management is that replicable builds are extremely valuable, this is something that can only be done by storing appropriate metadata about the state of the system that needs to be built along with the machinery to draw in the dependencies specified in that metadata. (there is of course a lot more that has to go into package management, its a hard problem space)

Perhaps the best indication of the progress that's been made in package management is that the various issues with the LaTeX installation just were the same types of issues you saw everywhere else 15 years ago.

Dealing with dependencies in a LaTeX project

I think perhaps the biggest issue that people have when collaborating on LaTeX documents is that you have to manually declare the dependencies of your work. This can be difficult since the way in which things are structured with LaTeX tends to assume that the systems get modified and changed over time and not much metadata is stored in the .tex files. You often want some form of unchanging snapshots of the state of the system that will render the documents to be shared if you have to make sure your documents build the same now as they did before. Because LaTeX and similar systems have placed a very high importance on stability the underlying TeX engine doesn't tend to introduce any backwards incompatibility versions with parsing the documents, but you can still run into issues with Plugins and Fonts.

If you are working on a team I'd highly recommend creating some form of container that sets up and shares the system that will build your .tex files.

Often this will be some sort of script that installs a bunch of system packages into some sort of virtual machine or container environment. Even if you don't go the container route a bootstrapping script that installs the relevant packages and fonts is a good idea. Here's an example of one such script that I have used on a recent project:

#!/bin/bash
# this script will install LaTeX on ubuntu 18.04 LTS
sudo apt install -y texlive-latex-extra pandoc librsvg2-bin texlive-luatex texlive-science fonts-powerline
mkdir -p ~/.fonts/example-project/
wget -P ~/.fonts/example-project/ http://mirrors.ctan.org/fonts/xcharter/opentype/XCharter-Bold.otf http://mirrors.ctan.org/fonts/xcharter/opentype/XCharter-BoldItalic.otf http://mirrors.ctan.org/fonts/xcharter/opentype/XCharter-BoldSlanted.otf http://mirrors.ctan.org/fonts/xcharter/opentype/XCharter-Italic.otf http://mirrors.ctan.org/fonts/xcharter/opentype/XCharter-Roman.otf http://mirrors.ctan.org/fonts/xcharter/opentype/XCharter-Slanted.otf
if [ ! -d " ~/.fonts/Powerline" ]; then
    # Only clone the repo into fonts if the directory didn't previously exist
    git clone git@github.com:powerline/fonts.git ~/.fonts/example-project/Powerline
fi
sudo fc-cache -f -v

Effectively this is trying to deal with the lack of dependency management by writing your own one. Its not perfect since it won't necessarily result in the same build on multiple systems (for example the apt packages may change upstream, the git repo may change, etc). Creating containers will make this better but still won't quite get a 100% replicable build. That said this is far better than having nothing, by getting some dependencies documented via an installation script that specifies them you can save your team and your future self a lot of hassle.

Why I'm using TeX less for documents now

Ten years ago LaTeX was an environment I'd quickly go to, for typesetting mathematics or engineering documents it was an amazing win for me at that time. While it's still useful, now over a decade later, there's 2 main things that have really pushed me away from TeX:

  1. The rise of unicode, from emojis to more internationalization, there's just a lot more text out there that's not ASCII encoded. And honestly many TeX distributions deal poorly with non-ASCII encodings.
  2. I'm working on more collaborative projects where the difficulties in packaging the TeX related components is exposed more.

To be fair as I mentioned earlier TeX predates Unicode, and many systems that predated unicode have had issues with unicode adoption due to design choices they made. When I first started with development the state of unicode was a complete and utter mess across most technologies. The pain of doing unicode properly with enough that you be strongly tempted to just be sloppy and not deal with it. But a huge shift has happened where Unicode support is very much improved in many environments so dealing with it properly has become overall worthwhile for people who regularly deal with such work. A big part of the Python 2 to 3 backwards compatibility break was around this unicode issue, and now that has been done life is much easier. TeX is in a difficult spot here because it explicitly has gone to great lengths to maintain backwards compatibility with documents, this backwards compatibility is great for older documents but constrains the TeX projects options for new feature support that would make unicode easier. Increasingly if I do document generation automation people want unicode support, I've heard comments like "emojis are just the same as any other text right?", non technical people don't want to care about character encoding issues and I don't want to have to make them care either. Saying to people that they can't use other foreign characters was sometimes defensible in the past, but really such a limitation isn't as defensible anymore for most situations in an increasingly multilingual world, and nor should it be in many cases.

Both these pain points are exacerbated greatly by the fact that I have far more demands on my time now. It seems that even if you are a frequent user of TeX that these pain points are still there. I think it's just that recently the use cases I encounter have changed, automation is more prevalent and internationalization is more important. The other major factor is that people just aren't printing things as often anymore, and there's been a definite shift towards web delivery for many things including reports. All the talk about the paperless office has only really started to become a reality for many people with the rise of the web. This web hegemony definitely is a huge force which has impacts beyond just the web, even in cases where webpages haven't displaced other document delivery there's definitely cases where CSS and related technologies are taking over a large amount of the mindshare of formatting documents. As such a large amount of exceptionally good tooling has arisen around the web stack. Using these "web" technologies for everything still misses some key points, take the EPUB format for example, it is very much a web based approach to ebooks and traditionally its support for MathML has been poor.

I have definitely found the pull of the web strong, but I think some non-web layout formats are still very important. As time goes on maybe the web, and it's associated technologies, will continue to steamroll everything else. There is definitely some convenience from using the same technologies everywhere, amplified by all of the associated network effects of people working on and learning those technologies, but it comes at the cost of one stack dominating and not really being ideal in specific cases. If you really need the power of something like LaTeX it still is definitely worth knowing about. Perhaps this decision is all just yet another manifestation of worse is better?

One thing I have found is that when I am using LaTeX now it is often embedded in other formats. This is something you can see in action right here on this blog. See this post where I explain how I enabled LaTeX in blog posts on this site.

Published: Mon 30 September 2019
Modified: Tue 28 April 2020
By Janis Lesinskis
In Software-engineering
Tags: LaTeX productivity TeX typesetting report-generation

links

  • JaggedVerge

social

  • My GitHub page
  • LinkedIn

Proudly powered by Pelican, which takes great advantage of Python.