Broken beyond repair – Pythons import system

It often happens that language features we have once distasted become comfortable after a while. We learn to live with them, take their advantages and forget about their annoyance. For many Python developers things like the explicit self in the argument list of a method is among them or len as a function. Experienced Python programmers are usually weakly excited about alternative proposals and the longer this quirk is discussed their mood turns towards -0 which is encoded as “why bothering?”.

Other features are the cause of continuous struggle and we never get used to them but don’t talk much about them because they aren’t low hanging fruits. They might even turn from bad to worse in the course of improving them in some respects. Those features feel like being broken beyond repair which means that it’s not easy to patch them but they need a fundamental redesign. In my opinion Pythons import machinery has this quality and it was regrettably forgotten in the many “cleanups” of Python 3.

Pythons import machinery is complicated to understand but this alone isn’t an argument against it. Many systems are far more complex but they are well thought out. The import machinery seems to be designed by implementation and cobbled together. This has finally lead to accidental design artifacts like the module/script distinction which makes sense only in the light of Pythons import implementation.

Python always suffered from the problem that two modules M1.py and M2.py which import a third module M3.py from different import paths also receive different module objects <M3>. This is as if we gave up coordinate independence but believed as we are moving through the space all objects are changing continuously. The objects themselves not their perspective of them. We really get different objects and we can’t compare our data with each other.

Not only are module objects path dependent but the success of a module import may depend on the liveliness of other modules. Suppose M1.py defines the following relative import: from . import M2. This import fails if the package P containing both M1.py and M2.py has not been imported already. When running M1.py from the command line it’s not relevant that we declared P as a package by placing an __init__.py file into it we still get the increasingly famous error message:

ValueError: Attempted relative import in non-package

This is not a Monty Python skit. The package exists and exists not. The interpreter doesn’t make any use of the availability of package P as long as it hasn’t been imported explicitly. But why does it has to be imported only to find M2.py? There is no obvious answer. It’s just the way it is implemented.

“We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. (Donald E. Knuth)

Back to the first problem. Once M1.py has imported M3.py the <M3> will be accessible on M1’s import path in the module cache sys.modules which is a flat dictionary of {module-path: <module>} pairs. The module M2.pyuses a different import path and a new and different module <M3> is created which is cached using another module-path.

This lookup with a module-path string is fast and it is usually faster than ordinary attribute lookup in objects which has to jump through a nested dictionary. I do think that’s basically it. If we implement a proper module cache providing the moderate performance of attribute lookups in objects we can get rid of the tedious import machinery that Python has right now. But I’d like to go even a little further and claim that a whole import system can be designed directly from a tiny set of objects and their properties. No Finders, Loaders, Importers and all the clunky stuff which pretends to be objects but is actually a service which serves the void.

I’ll flesh this idea out in a series of articles on this blog soon. This can be understood then as the constructive part of my criticism.

Newsflash: chances are that higher Python officers will succeed in confusing the situation even more. PEP 382 is on the way. Good luck with explaining why Python needs two notions of packages whereas it is difficult enough to justify even one. It’s not a coincidence though that this is placed in the context of setuptools. When Pythons import system is a tragedy than distutils and the layers above are the accompanying farce.

  1. Alec Munro says:

    Are you aware of all the work Brett Cannon has been doing on importlib? http://docs.python.org/dev/py3k/library/importlib.html

    I’m not nearly conversant enough in the import mechanism to know if it would address your concerns (and in fact, I’ve never had a significant problem with importing, so I’m clearly not your target audience), but I’ve read much of Brett’s posting on importlib, and it seems like he spent a lot of time trying to build it properly.

    Let me know your thoughts.

    Thanks

  2. yeah, do it. a redesign of the import system is sorely needed—one that is sane, and simple. i could imagine it would be worth while to steer towards statements à la batz = fetch( ‘foo/bar/batz.py’ ), blah = fetch( ‘foo/bar/batz.py/blah’ ), log = fetch( ‘~/math.py/log’ ), where fetch accepts relative, absolute, and symbolic paths (~ here being a symbol for the standard library).

    not so sure about making the .py extension mandatory; also, it is symbolic itself (as it could be supplanted by .pyc, .pyo where suitable); but if extensions are omitted, how do you distinguish between a file x.py and a package x? or maybe this is not necessary?

    also, some versioning system should be readily available, so you can e.g. state fetch( ‘foobar[>0.3]’ ), and, importantly, to use URL-like identifiers to avoid name clashes (but please not in the way that java does it), maybe à la mojo = fetch( ‘example.com/projects/fireball//magic/mojo.py’ ).

    just some arbitrary ideas. probably yours will be very different.

  3. kay says:

    Alec, I know about Brett Cannons importlib and I deeply respect his work. However importlib is mostly an attempt to reconstruct the structure and semantics of the current import machinery in accessible Python code which is much harder than anything I intend to do. I hope to show at the end of the promised article series that designing an import system hasn’t anything to do with reconstructing how black magics work. It won’t be complete though and I’ll neglect some of the features I do not understand and I’ve never used or tested like importing from “frozen packages”.

    Notice also one thing. Beyond an initialization phase my implementation will completely bypass Pythons current import system which importlib does not. Maybe there will be some rest. I’m not finished with it. This can be done in pure Python only by means of language transformations – import statements have to be transformed into function calls in Python code. This also means that it will always ever have the status of a prototype.

  4. kay says:

    @love’n flow, I do not intend to add new features in the first place. The current import system is already quite powerful due to import-hooks. I made extensive and also successful use of them in EasyExtend. I want something that is conceptually simple but as powerful as the current state-of-the-art solution.

    BTW I’m not quite clear if I like the idea of versioned module imports. This has the bad taste of compiler flags to me. But that’s right now really off topic and it may be perfectly possible that it fits into the new structure quite naturally. Importing from URLs will be naturally provisioned although I don’t give a reference implementation and there also won’t be changes in the Python grammar to support URLs in import statements. If I’ve got a working prototype you might checkout implementing a fetch command yourself.

  5. The import machinery is big and complicated because it can do lots of things. Individuals might only use one or two of those features but they aren’t the same one or two features. If imports were simplified users would just re-add a custom layer on top of imports to get their feature back.

    Please consider making your writing style less combative. It is heavy on assertions and emotional adjectives (“Python is Doomed” because it uses whitespace indentation). You seem to pick one python feature a week and denounce it as an unforgivable evil. There might be two people on the planet that can be bullied into agreeing with you but for the rest of us you will have to make an agument, not propaganda.

  6. PJ Eby says:

    Also, some of us like explicit self and len as a function. So you’re already turning off that segment of your audience from the first paragraph.

    But as long as we’re being inflammatory and trolling, I personally think that people who want to replace imports with file paths are stuck in PHP-land. Even Perl and Java have native support for namespace packages. It’s about time Python included something better than pkgutil.extend_path.

  7. kay says:

    Jack, the “Python is Doomed” article was entirely ironic about the mentioned aspect. There are many superstitious believes about significant whitespace and occasional panic attacks even by long term Python programmers. I found the idea of the single peak fitness plateau where Python apparently lives ( and dies ) so heartbreaking that I couldn’t resist being satirical ( or satyrical? ) about it.

    The article about the import machinery is meant deadly serious though for reasons given there. It has been a constant pain since I can think about using Python 1.5.2 and it became even worse over the time. I actually want people to acknowledge this even if it produces bad PR temporarily ( people are forgetful and move on to things they find more important and entertaining than a rather technical and esoteric topic they can virtually contribute nothing to it ).

  8. Agree with the others that opening with your comments on len and self kind of mark you out as a loony from the start… Anyway, whilst the issue you discuss has occasionally been an issue for me (one usually remedied) it is certainly highly bombastic to declare the Python import machinery broken beyond repair. Very odd. 🙂

  9. kay says:

    @Philip, Micheal. I mentioned len and self only because I precisely know that those topics are controversial. They were so for the last 10 years and they will be so for the next 10 years as well. They are unusual and ideosyncratic and although some programmers might have fallen in love with them on the first sight I do not think that’s very likely. People also love Lispy parentheses because they became used to them and learned to appreciate them. That’s virtually the same situation as I described in the opening paragraph.

    Although the tone of the article is clearly emotional and intended to be so there are also arguments. If you find something reasonable about how relative imports work or the sys.modules cache you are free to provide a counter-argument, either here in the comment section or on your own blog.

  1. There are no trackbacks for this post yet.

Leave a Reply