{"id":256,"date":"2009-04-07T14:56:56","date_gmt":"2009-04-07T13:56:56","guid":{"rendered":"http:\/\/fiber-space.de\/wordpress\/?p=256"},"modified":"2009-04-07T20:37:45","modified_gmt":"2009-04-07T19:37:45","slug":"pyload-part-i-path-objects","status":"publish","type":"post","link":"http:\/\/fiber-space.de\/wordpress\/2009\/04\/07\/pyload-part-i-path-objects\/","title":{"rendered":"Pyload part I ( Path objects )"},"content":{"rendered":"<p>This article is the first in a series that studies the design of a <em>module import system<\/em>. Although this work is situated in the Python context it can stay on its own to a large extent and ideas may be transferred to other systems and languages. We  will provide as much integration with Python as necessary but keep the structure as general as possible. The article series will roughly cover following topics:<\/p>\n<ol>\n<li>Path objects &#8211; module paths and their representation<\/li>\n<li>ModuleNode objects &#8211; loading and caching modules using Path objects<\/li>\n<li>Import system &#8211; binding ModuleNode objects to import statements<\/li>\n<li>Implementation &#8211; putting it all together in an EasyExtend langlet as a reference implementation<\/li>\n<\/ol>\n<p>In this article we will discuss `Path` objects. This topic is foundational and a bit dry but I hope the reader will be compensated by the elegance of some its constructions. `Path` objects are used to establish a relationship between internally used name spaces and external path structures in physical or logical &#8220;media&#8221; like file-systems, zip-files or the web. Those path structures can also be fully abstract and represented by special data-structures only. We&#8217;ll provide several examples.<\/p>\n<p>First some terminology is introduced. Many of the notions given here a rather abstract but they mostly capture what people know about Python modules, packages and file system paths anyway. They provide a conceptual background for specifications given later.<\/p>\n<h3>Terminology<\/h3>\n<p\/>\nA <span style=\"font-style: italic;\">module name<\/span> is an ordinary Python name: a finite sequence of alphanumeric characters and underscores starting with an alphabetic character or underscore. A <span style=\"font-style: italic;\">module path<\/span> is a dot-separated sequence of module names. It is possible for a module path to be preceded by dots. In that case a module path is called a <span style=\"font-style: italic;\">relative path<\/span>. `A.B.C` and `..A.B.C` are both module paths but only the latter one is relative. The names of a module path are also called its <span style=\"font-style: italic;\">components<\/span>. So `A`, `B`, `C` are the components of the module path `A.B.C`.<\/p>\n<p>Besides module paths we consider <span style=\"font-style: italic;\">external paths<\/span>. The intuitive meaning of an external path is that of a pointer to a location of a module in some medium. Most commonly file system paths are used as external paths: modules are represented as files and the dot separators are mapped onto file-system separators. Throughout this article we use a slash &#8220;\/&#8221; as an external path separator. So `A.B.C` is a module path and `A\/B\/C` is an external path. A proper external path definition is given below using a `Path` abstract base class.<\/p>\n<p>A module can be loaded from an external path which yields an interpreter level &lt;`module`&gt; object.  Each &lt;`module`&gt; object shall have a unique module name. If `M` is a known module name we write &lt;`M`&gt;. It is also possible to load &lt;`module`&gt; objects from builtins or even create fresh &lt;`module`&gt; objects on the fly. In any case we still consider a &lt;`module`&gt; being loaded from a path. If no such path is available we associate the &lt;`module`&gt; with the empty path.<\/p>\n<p>A module path `A.B.C&#8230;` is <span style=\"font-style: italic;\">valid<\/span> if an external path `&#8230;\/A\/B\/C\/&#8230;` exists and `<A>` can be loaded from `&#8230;\/A`, `<B>` can be loaded from `&#8230;A\/B` etc.<\/p>\n<p>We call a module `P` a <span style=\"font-style: italic;\">package<\/span> if there is a module `M` and if `P1.M` and `P2.M` are valid module paths then `P1 = P2 = P`. So each module has at most one package. If a module has no package we call it an <span style=\"font-style: italic;\">unpackaged<\/span> or a <span style=\"font-style: italic;\">free<\/span> module.  For any module the chain of packages `P1.P2&#8230;.M` containing M shall be finite. This implies that each of such chains has a maximum length. If `P0.P1.P2&#8230;M` is a a module path of maximal chain of packages we call `P0` a <span style=\"font-style: italic;\">top level module<\/span> and the module path a <span style=\"font-style: italic;\">full path<\/span>. Each unpackaged module is by definition also a top level module.<\/p>\n<p>Notice that the concept of a <span style=\"font-style: italic;\">namespace<\/span> is a little more general than the ones we have defined. Each &lt;`module`&gt; has an associated namespace. This is usually a `__dict__` for user defined modules. This namespace can contain other modules as well and might be changed dynamically. We rather intend to have a close reference of the module path concept with the way it is used in an <em>import statement<\/em>.<\/p>\n<p>With `PYTHONPATH` we denote a set of external paths from which modules can be found. Those paths can represent file-system paths, zip-files or other entities. The `PYTHONPATH` may contain external paths that are paths of some package. In this case the modules we can reach from the `PYTHONPATH` are <span style=\"font-style: italic;\">not<\/span> all top-level modules. In this  situation the top-level module `P0` of a full path `P0.P1&#8230;..M` may <em>not<\/em> be reachable from `PYTHONPATH`. We call such a full path a <span style=\"font-style: italic;\">finger<\/span>.<\/p>\n<p>The intuitive idea of an external path as a file path is actually a bit too simplistic. In practice there might be various files that might correspond to a module. For example `A.py`, `A.pyc`, `A.pyd` are all files that correspond to a Python module `A`. An external path is a class of file paths and they are equivalent in the sense that they all describe a single module entity. In this sense a set of file suffixes is an equivalence class of file paths.<\/p>\n<h3>Path objects<\/h3>\n<p\/>\nPath objects are defined as abstract base classes.  Concrete subclasses of `Path` are `FSPath` which uses file system operations to implement `Path`methods, `ZipPath` which combines file system path operations with those provided by the &lt;`zipfile`&gt; module. We have also defined a silly `Path` subclass called `TuplePath`. All of those `Path` objects are used to represent external paths or path-like objects.<\/p>\n<p\/>\n<pre lang=\"python\">from abc import *\r\nclass Path:\r\n    __metaclass__ = ABCMeta\r\n    def __init__(self, pathobj):\r\n        self.path = pathobj\r\n\r\n    @abstractmethod\r\n    def fullpath(self):\r\n        '''\r\n        Derives full module path from given path.\r\n        '''\r\n\r\n    def find(self, modulepath):\r\n        '''\r\n        Interprets the modulepath as a sequence of parent \/ joinpath operations.\r\n        '''\r\n        P = self\r\n        components = modulepath.split(\".\")\r\n        if components[-1] == '':\r\n            components.pop()\r\n        for component in components:\r\n            if component is \"\":\r\n                P = P.parent()\r\n            else:\r\n                P = P.joinpath(component)\r\n            if not P.exists():\r\n                raise ValueError(\"invalid path object: '%s'\"%P.path)\r\n        return P\r\n\r\n    @abstractmethod\r\n    def ispackage(self):\r\n        '''\r\n        Returns true if path belongs to a package and false otherwise.\r\n        '''\r\n\r\n    @abstractmethod\r\n    def ismodule(self):\r\n        '''\r\n        Returns True if path corresponds with a Python module and False otherwise.\r\n        '''\r\n\r\n    @abstractmethod\r\n    def isempty(self):\r\n        '''\r\n        Returns True if this path object is empty and False otherwise.\r\n        '''\r\n\r\n    @abstractmethod\r\n    def exists(self):\r\n        '''\r\n        Returns True if the path object is valid and False otherwise.\r\n        '''\r\n        return self.path in self.files\r\n\r\n    @abstractmethod\r\n    def parent(self):\r\n        '''\r\n        If path is ...\/A\/B\/C a new path object initialized by ...\/A\/B will be\r\n        created and returned.\r\n        '''\r\n\r\n    @abstractmethod\r\n    def children(self):\r\n        '''\r\n        All valid one element extensions of path. If there are no such children return\r\n        None.\r\n        '''\r\n\r\n    @abstractmethod\r\n    def joinpath(self, *args):\r\n        '''\r\n        Joins the current path object with those provided by args in the sequence of\r\n        occurrence in args. So if this path is ..\/A\/B then self.joinpath('C\/D', 'E') will\r\n        return ..\/A\/B\/C\/D\/E.\r\n        '''\r\n\r\n    @abstractmethod\r\n    def base(self):\r\n        '''\r\n        Returns the rightmost element of a path. If ..\/A\/B\/C is a path it C will be\r\n        returned.\r\n        '''\r\n\r\n    @abstractmethod\r\n    def split(self):\r\n        '''\r\n        Returns a tuple containing path components.\r\n        '''\r\n\r\n    @abstractmethod\r\n    def splitbase(self):\r\n        '''\r\n        Returns the tuple (parent, base) for a given path.\r\n        '''\r\n\r\n    def __repr__(self):\r\n        return \"<%s : %s>\"%(self.__class__.__name__, self.path)<\/pre>\n<p>The methods `find` and `fullpath` establish the relationship between `Path` objects and module paths. The `fullpath` method is intended to return the module path that corresponds to the `Path` object. The concrete `find` method is a representation of a module path in terms of the `parent` and `joinpath` methods which are also defined in the `Path` class.  So each module path has a an unambiguous interpretation as a sequence of operations on a `Path` object.<\/p>\n<h3>FSPath objects<\/h3>\n<p\/>\nNow we start looking at concrete examples. The first and most important example is that of the `FSPath` class which relies on file system operations defined in `os.path`. The `FSPath` object enables configurable file suffixes. In Python file-suffix data are hardcoded. They can be accessed using the<br \/>\n`<imp>` module function `get_suffixes()`. In our implementation we provide a class wrapper called `SuffixInfo` for the 3-tuples that describes a suffix.<\/p>\n<pre lang=\"python\">import os\r\nimport imp\r\n\r\nclass SuffixInfo(object):\r\n    def __init__(self, suffix_data):\r\n        self.suffix    = suffix_data[0]\r\n        self.read_mode = suffix_data[1]\r\n        self.priority  = suffix_data[2]\r\n\r\nclass FSPath(Path):\r\n    modulesuffixes = set()\r\n    def __init__(self, pathobj):\r\n        self.path = pathobj\r\n\r\n    @classmethod\r\n    def add_suffix(cls, suffixinfo):\r\n        cls.modulesuffixes.add(suffixinfo)\r\n\r\n    def fullpath(self):\r\n        if self.ismodule():\r\n           module = self.base()\r\n           name, ext = os.path.splitext(module)\r\n           P = self.parent().fullpath()\r\n           return P+\".\"+name if P else name\r\n        elif self.ispackage():\r\n           directory, name = self.splitbase()\r\n           P = directory.fullpath()\r\n           return P+\".\"+name if P else name\r\n        else:\r\n           return \"\"\r\n\r\n    def ispackage(self):\r\n        if os.path.isdir(self.path):\r\n            if os.path.isfile(self.path+os.sep+\"__init__.py\"):\r\n                return True\r\n        return False\r\n\r\n    def isempty(self):\r\n        return self.path == \"\"\r\n\r\n    def exists(self):\r\n        return self.ispackage() or self.ismodule()\r\n\r\n    def ismodule(self):\r\n        suffixes = [\"\"]+[suffix.suffix for suffix in self.modulesuffixes]\r\n        for suffix in suffixes:\r\n            if os.path.isfile(self.path+suffix):\r\n                return True\r\n        return False\r\n\r\n    def parent(self):\r\n        return self.__class__(os.path.dirname(self.path))\r\n\r\n    def children(self):\r\n        if os.path.isdir(self.path):\r\n            return [self.__class__(self.path.join(f)) for f in os.path.listdir(self.path)]\r\n\r\n    def joinpath(self, *args):\r\n        return self.__class__(os.sep.join([self.path]+list(args)))\r\n\r\n    def base(self):\r\n        return os.path.basename(self.path)\r\n\r\n    def split(self):\r\n        return self.path.split(os.sep)\r\n\r\n    def splitbase(self):\r\n        return self.parent(), self.base()<\/pre>\n<p>The next two path objects have separate listings.<\/p>\n<h3>ZipPath objects<\/h3>\n<p\/>\nThe <a href=\"http:\/\/www.fiber-space.de\/misc\/ZipPath.html\">ZipPath<\/a> object is similar to the `FSPath` object. The main difference is that there is no natural way to walk within a zip-file and the `ZipPath` class pulls out the packed directory structure and makes it explicit. This particular step has to be made only once per zip-file which means that for all `parent()`, `joinpath()` operations that create new `ZipPath` objects from an existing `ZipPath` no new unzip operations are performed and no new inspections are needed. Those will become relevant at different places where we have to load module content &#8211; but load operations don&#8217;t affect `Path` objects.<\/p>\n<h3>TuplePath objects<\/h3>\n<p\/>\nThe <a href=\"http:\/\/www.fiber-space.de\/misc\/ZipPath.html\">TuplePath<\/a> object may be entirely useless but since we enter here the level of plain data structures it might inspire more exotic Path objects that could be useful for some.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This article is the first in a series that studies the design of a module import system. Although this work is situated in the Python context it can stay on its own to a large extent and ideas may be &hellip; <a href=\"http:\/\/fiber-space.de\/wordpress\/2009\/04\/07\/pyload-part-i-path-objects\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[4],"tags":[],"_links":{"self":[{"href":"http:\/\/fiber-space.de\/wordpress\/wp-json\/wp\/v2\/posts\/256"}],"collection":[{"href":"http:\/\/fiber-space.de\/wordpress\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/fiber-space.de\/wordpress\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/fiber-space.de\/wordpress\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/fiber-space.de\/wordpress\/wp-json\/wp\/v2\/comments?post=256"}],"version-history":[{"count":22,"href":"http:\/\/fiber-space.de\/wordpress\/wp-json\/wp\/v2\/posts\/256\/revisions"}],"predecessor-version":[{"id":278,"href":"http:\/\/fiber-space.de\/wordpress\/wp-json\/wp\/v2\/posts\/256\/revisions\/278"}],"wp:attachment":[{"href":"http:\/\/fiber-space.de\/wordpress\/wp-json\/wp\/v2\/media?parent=256"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/fiber-space.de\/wordpress\/wp-json\/wp\/v2\/categories?post=256"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/fiber-space.de\/wordpress\/wp-json\/wp\/v2\/tags?post=256"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}