TIL: Python import machinery details

So today I was looking at import machinery in Python due to this question about import styles in Persephone, an open source natural language processing library I've been contributing to.

Looking around at this more I've found a couple of things that I didn't realize before:

Python 2.7.12 (default, Dec  4 2017, 14:50:18)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> def foo():
...     from math import *
...     return sqrt(4)
<stdin>:1: SyntaxWarning: import * only allowed at module level
>>> foo()

This warning was introduced in Python 2.1 via PEP 227 the 2.1 release was the same release that introduced the Warnings framework itself. Nested scopes can break the import * so the language specification specifically prohibits this. However as you can see the CPython 2.x implementation doesn't enforce it, so this produces a warning but works when you try to use it.

Python 3.5.2 (default, Nov 23 2017, 16:37:01)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> def foo():
...    from math import *
...    return sqrt(4)
File "<stdin>", line 1
SyntaxError: import * only allowed at module level

This produces a syntax error. The Python3 approach here is much nicer considering that the code is prohibited by the spec.

Why this happens is a lot more interesting though, and ultimately comes down to efficiency.

Basically a lot of efficiency gains can be had if the local variables are known at function parsing time.

So lets look at what a recent CPython implementation does. First lets look at FrameObject in include/frameobject.h:

typedef struct _frame {
    struct _frame *f_back;      /* previous frame, or NULL */
    PyCodeObject *f_code;       /* code segment */
    PyObject *f_builtins;       /* builtin symbol table (PyDictObject) */
    PyObject *f_globals;        /* global symbol table (PyDictObject) */
    PyObject *f_locals;         /* local symbol table (any mapping) */
    PyObject **f_valuestack;    /* points after the last local */
    /* Next free slot in f_valuestack.  Frame creation sets to f_valuestack.
       Frame evaluation usually NULLs it, but a frame that yields sets it
       to the current stack top. */
    PyObject **f_stacktop;
    PyObject *f_trace;          /* Trace function */
    char f_trace_lines;         /* Emit per-line trace events? */
    char f_trace_opcodes;       /* Emit per-opcode trace events? */

    /* Borrowed reference to a generator, or NULL */
    PyObject *f_gen;

    int f_lasti;                /* Last instruction if called */
    /* Call PyFrame_GetLineNumber() instead of reading this field
       directly.  As of 2.3 f_lineno is only valid when tracing is
       active (i.e. when f_trace is set).  At other times we use
       PyCode_Addr2Line to calculate the line from the current
       bytecode index. */
    int f_lineno;               /* Current line number */
    int f_iblock;               /* index in f_blockstack */
    char f_executing;           /* whether the frame is still executing */
    PyTryBlock f_blockstack[CO_MAXBLOCKS]; /* for try and loop blocks */
    PyObject *f_localsplus[1];  /* locals+stack, dynamically sized */
} PyFrameObject;

This represents a Python stack frame.

Note we have f_localsplus this stores the locals variables.

Now how these frames are evaluated is interesting, because stack frames are frequently created and destroyed there is an optimization used to make function calls faster. Each code object has a "zombie frame" associated with it that allows a faster call because the frame itself doesn't need to have memory allocated for it during the call. This frame gets created at the same time as the code object and has its memory allocated at creation time, so when the call happens this memory just gets reused and the required values are filled into this pre-existing frame directly.

There's a particularly informative comment here in Objects/frameobject.c

/* Stack frames are allocated and deallocated at a considerable rate.
   In an attempt to improve the speed of function calls, we:
   1. Hold a single "zombie" frame on each code object. This retains
   the allocated and initialised frame object from an invocation of
   the code object. The zombie is reanimated the next time we need a
   frame object for that code object. Doing this saves the malloc/
   realloc required when using a free_list frame that isn't the
   correct size. It also saves some field initialisation.
   In zombie mode, no field of PyFrameObject holds a reference, but
   the following fields are still valid:
     * ob_type, ob_size, f_code, f_valuestack;
     * f_locals, f_trace are NULL;
     * f_localsplus does not require re-allocation and
       the local variables in f_localsplus are NULL.

What this design means is that the struct member f_localsplus is created at the time that the code_object is created. (The code object is created when the function is first encountered) So it means that all local variable names must be defined at that point in time the function is defined so that these variables can be stored in this part of the frameobject structure. Not needing to constantly calculate the number of local variables to allocate and deallocate the memory needed for the function call is a significant win. This way the function when called doesn't need to find the local variables, they are already there and they just need to be populated with the values. As a result of this optimization from mod import * must be made illegal within the function scope and hence this is why the syntax error is raised.

For an example of some implementation code that requires this in Objects/frameobject.c:

static void _Py_HOT_FUNCTION
frame_dealloc(PyFrameObject *f)
    PyObject **p, **valuestack;
    PyCodeObject *co;

    if (_PyObject_GC_IS_TRACKED(f))

    /* Kill all local variables */
    valuestack = f->f_valuestack;
    for (p = f->f_localsplus; p < valuestack; p++)

This fast clearing of locals to enable stack frame reuse can only work if f_localsplus is always the same size. If the size could be different between stack frames a lot more checks would have to be inserted, but as discussed above this would be a performance hit in a very critical part of the implementation.