Finding out what a python script imports
A while back I've did some work on a static site generator called Gilbert and one of the nice things that it does is handle loaders for files via plugins.
Essentially it uses the new Python namespace packages (introduced in PEP 420) to allow you to make your own installable plugins. I was making a repository to support markdown with metadata for Gilbert. One of the juniors I was working with wanted to know more about how namespace packages like this worked so I started making a tutorial repository to explain namespace packages.
Since plugin systems are a prime use case for namespace packages I wanted to make an example that shows how
you can create a plugin system in these namespace packages.
I won't go into too much detail here since this is worth a whole post on its own.
The idea is that you have a main package called greeter_example
that will print out all the greetings that
are registered via that greeting plugins that are provided in the namespace package greeter_example.greetings
.
One of the thoughts that crossed my mind in making this example is that I wanted to be able to scan through the
imports and see what would be imported if you did something like a from greeter_example.greetings import *
.
Registering the plugins tends to work best by having some sort of import from greeter_example
that provides a registration point.
The reason this works is because importing a module in Python executes the code in that module.
But I couldn't help but wonder if it would be possible to figure out what was available in the namespace module without executing it first?
The reasoning being that I thought it would be nice to generate a report of what files and modules got imported when you imported from the namespace package and which ones didn't.
Finding what modules get imported by a python script
Say you have a script like data_cleaner.py
that imports various things and you are wondering what it imports when you run it.
Static analysis of the code may not be able to reveal what's imported due to the dynamic nature of Python.
Running the code to find out what it imports may be a time consuming process, or worse.
Lets say you do decide you want to figure out what gets imported by running the code, you don't have to actually write this code yourself since there is a handy module in the standard library called modulefinder that will let you execute code and get a report of what was imported.
#data_cleaner.py
import re
try:
import this_library_does_not_exist
except ImportError:
print("Couldn't import this_library_does_not_exist")
# Cleans up some data here
Then from another script or interpreter session we can do this:
from modulefinder import ModuleFinder
finder = ModuleFinder()
finder.run_script('data_cleaner.py')
for name, mod in finder.modules.items():
print(f"{name}: {','.join(mod.globalnames.keys()[:3])})
print('Modules not imported:')
print('\n'.join(finder.badmodules.keys()))
This is quite handy.
One thing I'd like to perhaps do if this comes up again is to make some sort of function that can give this information without having to just blindly execute the whole script. I might try this by hacking around with the importlib machinery, I don't suspect this will be straightforward to accomplish however.