Using the Python markdown library
Lately I've been working on a static site generator called Gilbert and one of the nice things that it does is handle loaders for files via plugins. Specifically a lot of posts (including this one) have been written in Markdown that contains some metadata section. Recently I've been making a plugin to support markdown with metadata for Gilbert to make such documents easy to use with that framework.
For example this post it looks a little like this:
---
Title: Finding out what a python script imports
Date: 2019-07-18 22:30
Category: Software-engineering
Tags: python, software-engineering, TIL, packaging, modules
Slug: finding_what_a_python_module_imports
Authors: Janis Lesinskis
Summary: Finding out what a Python module imports
---
Lately I've been working on a static site generator called [Gilbert](https://github.com/funkybob/gilbert)
We want to be able to load these markdown documents and be able to extract this metadata in the frontmatter section.
Working with markdown documents from Python
There's a library called Markdown that will allow you to load markdown documents.
Let's look at an example using buzzword ipsum and Veggie ipsum:
# Main heading
Vegetables are more healthy for you than buzzwords, see for yourself:
## Buzzwords
In the future, will you be able to conservatively synergise ballpark figures in your business? Our Long-Term Enterprise solution offers capabilities a suite of competitive offerings. Knowledge transfer propositions are becoming innovative organic growth experts. Is your cloud prepared for customer-focused step-change growth?
## Vegetables
Veggies es bonus vobis, proinde vos postulo essum magis kohlrabi welsh onion daikon amaranth tatsoi tomatillo melon azuki bean garlic.
Gumbo beet greens corn soko endive gumbo gourd. Parsley shallot courgette tatsoi pea sprouts fava bean collard greens dandelion okra wakame tomato. Dandelion cucumber earthnut pea peanut soko zucchini.
First we have to install the library:
pip install markdown
Then to use it we can do something like this:
import markdown
from pathlib import Path
md = markdown.Markdown()
ipsum_path = Path('ipsum.md')
data = ipsum_path.read_text(encoding='utf-8')
html = md.convert(data)
print(html)
This will give us some output like this:
<h1>Main heading</h1>
<p>Vegetables are more healthy for you than buzzwords, see for yourself:</p>
<h2>Buzzwords</h2>
<p>In the future, will you be able to conservatively synergise ballpark figures in your business? Our Long-Term Enterprise solution offers capabilities a suite of competitive offerings. Knowledge transfer propositions are becoming innovative organic growth experts. Is your cloud prepared for customer-focused step-change growth?</p>
<h2>Vegetables</h2>
<p>Veggies es bonus vobis, proinde vos postulo essum magis kohlrabi welsh onion daikon amaranth tatsoi tomatillo melon azuki bean garlic.</p>
<p>Gumbo beet greens corn soko endive gumbo gourd. Parsley shallot courgette tatsoi pea sprouts fava bean collard greens dandelion okra wakame tomato. Dandelion cucumber earthnut pea peanut soko zucchini. </p>
Frontmatter
As per the original example we have some frontmatter we also want to use.
There's an extension for the markdown library that will allow us to access the frontmatter: https://python-markdown.github.io/extensions/meta_data/. We can use it like so:
import markdown
from pathlib import Path
ipsum_path = Path('frontmatter_ipsum.md')
data = ipsum_path.read_text(encoding='utf-8')
md = markdown.Markdown( extensions = ['meta'], output_format='html5')
html = md.convert(data)
print(md.Meta)
Running this gives the following:
{'title': ['"Ipsum example"'], 'date': ['2019-07-18 22:30'], 'category': ['Examples']}
You'll notice that all the keys and values are stored as strings. If you want to convert to another format you'll have to do this using some sort of schema.
The way Gilbert does this is quite interesting, but is by no means the only way.