Pattern matching with TupleTrees

As it seems advanced dispatch schemes are discussed right now under the pattern matching label. In this article I want to discuss another solution using a data-structure called the `TupleTree` ( also known as prefix tree or trie ). The tuple tree solution comes closer to PEAK rules than to Marius Eriksens pattern matching engine. I find it more elegant than the PEAK rules solution and there is less boilerplate. I can’t say much about the generality of PEAK rules though and they might cover a lot more.

A `TupleTree` is an efficient way to store tuples by factoring out common prefixes. Suppose you have a set of tuples:

`{(a, b, c), (a, b, d), (a, c, d), (b, d, d)}` then you can store the same information using a tree structure

`{(a, (b, ((c,) (d,))), (c, d)), (b, d, d)}`

Searching in the tuple tree is of complexity `O(log(n))` and can degenerate to `O(n)` if there is isn’t much to factorize.

This isn’t too interesting in the discussion of pattern matching schemes if we wouldn’t introduce two different kinds of wildcards or symbolic pattern called `ANY` and `ALL`.

ANY – a pattern that matches any symbol but with the lowest priority: if there is a tuple `(ANY, Y, …)` in the tuple tree then `(X, Y, …)` is matched by `(ANY, Y, …)` iff there isn’t a more specific matching tuple `(X, Y, … )` in the tree.

ALL – a pattern that matches any symbol but with the same priority as a more specific symbol. If there is a tuple `(ALL, Y, …)` in the tuple tree then `(X, Y, …)` is matched by `(ALL, Y, …)` and by a more specific tuple `(X, Y, … )` if present. This means `ALL` creates an ambiguity.

We can consider `ALL` as a variable and we eliminate the ambiguity using value substitution. Let’s say we have a set of tuples `{(ANY, Y), (X, Y), (ALL, Z)}` then elimination of `ALL` leads to `{(ANY, Y), (ANY, Z), (X,Y), (X, Z)}` and the tuple tree `{(ANY, (Y,), (Z,)), (X, (Y,), (Z,))}`.

TupleTree implementation

First we define the mentioned pattern `ANY` and `ALL`

class Pattern:
    def __init__(self, name):
        self.name = name
    def __repr__(self):
        return "<P:%s>"%self.name
 
ANY = Pattern("ANY")
ALL = Pattern("ALL")

Now we create the `TupleTree` object. The `TupleTree` implements two methods `insert` and `find`. The `insert` method takes a tuple and a key as parameters. It inserts the tuple and stores a `key` at the location of the tuple. The `find` method takes a tuple and returns a key if it was inserted at the location of the tuple.

class TupleTree(object):
    def __init__(self):
        self.branches = {}
        self.key = None
        self.all = None
 
    def insert(self, args, value):
        if len(args) == 0:
            self.key = value
            return
        first = args[0]
        if first == ALL:
            for node in self.branches.values():
                node.insert(args[1:], value)
            self.all = (args[1:], value)
        elif first in self.branches:
            node = self.branches[first]
            node.insert(args[1:], value)
            if self.all:
                node.insert(*self.all)
        else:
            tree  = TupleTree()
            self.branches[first] = tree
            tree.insert(args[1:], value)
            if self.all:
                node.insert(*self.all)
 
    def find(self, args):
        first = args[0]
        if first in self.branches:
            node = self.branches[first]
        elif ANY in self.branches:
            node = self.branches[ANY]
        if len(args) == 1:
            return node.key
        else:
            return node.find(args[1:])

The Dispatcher

It is easy to define a dispatcher that matches argument tuples against a tuple in a `TupleTree`. Handler functions which are decorated by a `Dispatcher` object are stored as tuple tree keys. Those handler functions are retrieved from the `TupleTree` when the `apply` method is called with concrete arguments.

class Dispatcher(object):
    def __init__(self):
        self.ttree = TupleTree()
 
    def __call__(self, *args):
        def handler(f):
            self.ttree.insert(args, f)
            return f
        return handler
 
    def apply(self, *args):
        handler = self.ttree.find(args)
        if not handler:
            raise ValueError("Failed to find handler that matches arguments")
        else:
            return handler(*args)

Example

As an example we create a new `Dispatcher` object and decorate handler functions using it.

alt = Dispatcher()
 
@alt("/", ANY)
def not_a_resource(path, method):
    print "not a resource"
 
@alt(ANY, "GET")
def retrieve_resource(path, method):
    print "retrieve resource"
 
@alt(ANY, "POST")
def update_resource(path, method):
    print "update resource", path
 
@alt(ALL, "PUT")
def create_new_resource(path, method):
    print "create new resource", path
 
@alt(ANY, ANY)
def invalid_request(path, method):
    print "invalid request", path

Notice that the `create_new_resource` handler is called when the HTTP command is `PUT` is passed even when the path is the root path “/”. This is caused by the `ALL` pattern in the first argument. For all other commands a “not a resource” message is printed.

>>> alt.apply("/home", "PUT")
create new resource /home
>>> alt.apply("/", "PUT")
create new resource /
>>> alt.apply("/", "GET")
not a resource
>>> alt.apply("/home", "GET")
retrieve resource /home
>>> alt.apply("/home", "PAUSE")
invalid request PAUSE
This entry was posted in Python. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

*