Posts Tagged ‘Add new tag’

Universal Set in Python

Posted in Mathematics, Python on April 16th, 2009 by kay – 5 Comments

Building sets from set() and ANY

When defining a regular expression there is a well known duality between a set of characters specified by enclosing characters or character ranges within square brackets […] and defining the pattern of the complement set [^…] by preceding the caret symbol ^ .

  • [0-9] — matches all digits
  • [^0-9] — matches all non-digits

The nature of the complement set mentioned here is less precisely defined and is implementation dependent in practice. If a regexp matcher knows about ASCII only then [^0-9] is the set of all ASCII characters which are not digits.

Suppose now we want to build a character matching set from sets in the sense of a set data-type as inhabited in Python as a builtin. This is easy because we start from the empty set() and add just more elements. Each finite set can be constructed in this way and we can establish a set ~ pattern correspondence:

set(range(9)) ~ [0-9]

But how do we find a corresponding set for [^0-9]?

This cannot be finitely constructed from the bottom up using set() as a start value but it has to be finitely de-constructed from the set of all elements. This looks like we are running into unlimited comprehension and the Russel paradox but actually the set theory we are using is quite weak and element membership can be decided just by doing hashtable lookups.

Instead of constructing a set from set() we want to (de-)construct it from ANY. The name ANY is not chosen by chance because we want to express the correspondence with . pattern:

ANY ~ .

( Actually ANY is a little stronger because . doesn’t match newline characters unless explicitly specified by the DOTALL flag in the re.compile function. )

Using ANY we can easily state the above correspondence:

ANY – set(range(9)) ~ [^0-9]

Implementation of ANY

We define ANY as an instance of univset. The univset has a set-like interface and implements set operations consistently on sets of the type ANY-A, ANY-B,… where A, B… are finite sets constructed from set(). For example

ANY-A ∩ ANY-B = ANY – ( A ∪ B )
ANY-A ∪ B = ANY – ( A – B )

class univset(object):
    def __init__(self):
        self._diff = set()
 
    def __sub__(self, other):
        S = univset()
        if type(other) == set:
            S._diff = self._diff | other
            return S
        else:
            S._diff = self._diff | other._diff
            return S
 
    def __rsub__(self, other):
        return other & self._diff
 
    def __contains__(self, obj):
        return not obj in self._diff
 
    def __and__(self, other):
        return other - self._diff
 
    def __rand__(self, other):
        return other - self._diff
 
    def __repr__(self):
        if self._diff == set():
            return "ANY"
        else:
            return "ANY - %s"%self._diff
 
    def __or__(self, other):
        S = univset()
        S._diff = self._diff - other
        return S
 
    def __xor__(self, other):
        return (self - other) | (other - self)
 
    def add(self, elem):
        if elem in self._diff:
            self._diff.remove(elem)
 
    def update(self, elem):
        self._diff = self._diff - other
 
    def __ror__(self, other):
        return self.__or__(other)
 
    def union(self, other):
        return self.__or__(other)
 
    def difference(self, other):
        return self.__sub__(other)
 
    def intersection(self, other):
        return self.__and__(other)
 
    def symmetric_difference(self, other):
        return self.__xor__(other)
 
    def issubset(self, other):
        if type(other) == set:
            return False
        if issubset(other._diff, self._diff):
            return True
        return False
 
    def issuperset(self, other):
        if self._diff & other:
            return False
        return True
 
    def __lt__(self, other):
        return self.issubset(other)
 
    def __eq__(self, other):
        if type(other) == set:
            return False
        try:
            return self._diff == other._diff
        except AttributeError:
            return False
 
    def __ne__(self, other):
        return not self.__eq__(other)
 
    def __le__(self, other):
        return self.__lt__(other) or self.__eq__(other)
 
    def __gt__(self, other):
        return self.issuperset(other)
 
    def __gt__(self, other):
        return self.issuperset(other) or self == other

Examples

ANY = univset()
NON_DIGITS = ANY - set(range(9))
assert 8 not in NON_DIGITS
assert 'a' in NON_DIGITS
assert 0 in NON_DIGITS | set([0])

The statement that never was

Posted in General, Python on April 15th, 2009 by kay – 1 Comment

The Python release archive is a nice repository for archaeological findings. My biggest surprise was the existence of an access statement and access keyword in the pre-history of Python 1.5.0. The syntax was already specified in release 1.0 with an accompanying accessobject.c and it began to fade in Python 1.4.

The access statement

That is how the access statement was defined in 1.0

access_stmt: 'access' ('*' | NAME (',' NAME)*) ':' accesstype  (',' accesstype)*
accesstype: NAME+
# accesstype should be ('public' | 'protected' | 'private') ['read'] ['write']
# but can't be because that would create undesirable reserved words!

The access_stmt was a small statement i.e. one that doesn’t contain a block and could be used like print, del or exec.

So the proper use of the access statement would be like

class Point:
    access x, y, z: private write, public read

or more compact:

class Point:
    access *: private write, public read

The word public wasn’t a language keyword which made it possible to use it in this way:

class AccessModifier:
    access public: public
    access private: public
    access protected: public

Note that the access statement was purely declarative even more so than Java’s where you can write

public int x = 10;
and define x while specifying its type and access modifier. It was also more elaborated in terms of intended protections by means of differentiating between read and write.

How it never took off and gone away

In the HISTORY file of Python 1.0 we find the notice

There’s a new reserved word: “access”. The syntax and semantics are still subject of of research and debate (as well as undocumented), but the parser knows about the keyword so you must not use it as a variable, function, or attribute name.

In Python 1.4 the access statement was already uncommented from the Grammar.

The story ends with the following remark in the HISTORY of Python 1.5:

‘access’ is no longer a reserved word, and all code related to its implementation is gone (or at least #ifdef’ed out). This should make Python a little speedier too!