Pattern matching with TupleTrees

As it seems advanced dispatch schemes are discussed right now under the pattern matching label. In this article I want to discuss another solution using a data-structure called the TupleTree ( also known as prefix tree or trie ). The tuple tree solution comes closer to PEAK rules than to Marius Eriksens pattern matching engine. I find it more elegant than the PEAK rules solution and there is less boilerplate. I can’t say much about the generality of PEAK rules though and they might cover a lot more.

A TupleTree is an efficient way to store tuples by factoring out common prefixes. Suppose you have a set of tuples:

{(a, b, c), (a, b, d), (a, c, d), (b, d, d)} then you can store the same information using a tree structure

{(a, (b, ((c,) (d,))), (c, d)), (b, d, d)}

Searching in the tuple tree is of complexity O(log(n)) and can degenerate to O(n) if there is isn’t much to factorize.

This isn’t too interesting in the discussion of pattern matching schemes if we wouldn’t introduce two different kinds of wildcards or symbolic pattern called ANY and ALL.

ANY – a pattern that matches any symbol but with the lowest priority: if there is a tuple (ANY, Y, …) in the tuple tree then (X, Y, …) is matched by (ANY, Y, …) iff there isn’t a more specific matching tuple (X, Y, … ) in the tree.

ALL – a pattern that matches any symbol but with the same priority as a more specific symbol. If there is a tuple (ALL, Y, …) in the tuple tree then (X, Y, …) is matched by (ALL, Y, …) and by a more specific tuple (X, Y, … ) if present. This means ALL creates an ambiguity.

We can consider ALL as a variable and we eliminate the ambiguity using value substitution. Let’s say we have a set of tuples {(ANY, Y), (X, Y), (ALL, Z)} then elimination of ALL leads to {(ANY, Y), (ANY, Z), (X,Y), (X, Z)} and the tuple tree {(ANY, (Y,), (Z,)), (X, (Y,), (Z,))}.

TupleTree implementation

First we define the mentioned pattern ANY and ALL

class Pattern:
    def __init__(self, name):
        self.name = name
    def __repr__(self):
        return "<P:%s>"%self.name
 
ANY = Pattern("ANY")
ALL = Pattern("ALL")

Now we create the TupleTree object. The TupleTree implements two methods insert and find. The insert method takes a tuple and a key as parameters. It inserts the tuple and stores a key at the location of the tuple. The find method takes a tuple and returns a key if it was inserted at the location of the tuple.

class TupleTree(object):
    def __init__(self):
        self.branches = {}
        self.key = None
        self.all = None
 
    def insert(self, args, value):
        if len(args) == 0:
            self.key = value
            return
        first = args[0]
        if first == ALL:
            for node in self.branches.values():
                node.insert(args[1:], value)
            self.all = (args[1:], value)
        elif first in self.branches:
            node = self.branches[first]
            node.insert(args[1:], value)
            if self.all:
                node.insert(*self.all)
        else:
            tree  = TupleTree()
            self.branches[first] = tree
            tree.insert(args[1:], value)
            if self.all:
                node.insert(*self.all)
 
    def find(self, args):
        first = args[0]
        if first in self.branches:
            node = self.branches[first]
        elif ANY in self.branches:
            node = self.branches[ANY]
        if len(args) == 1:
            return node.key
        else:
            return node.find(args[1:])

The Dispatcher

It is easy to define a dispatcher that matches argument tuples against a tuple in a TupleTree. Handler functions which are decorated by a Dispatcher object are stored as tuple tree keys. Those handler functions are retrieved from the TupleTree when the apply method is called with concrete arguments.

class Dispatcher(object):
    def __init__(self):
        self.ttree = TupleTree()
 
    def __call__(self, *args):
        def handler(f):
            self.ttree.insert(args, f)
            return f
        return handler
 
    def apply(self, *args):
        handler = self.ttree.find(args)
        if not handler:
            raise ValueError("Failed to find handler that matches arguments")
        else:
            return handler(*args)

Example

As an example we create a new Dispatcher object and decorate handler functions using it.

alt = Dispatcher()
 
@alt("/", ANY)
def not_a_resource(path, method):
    print "not a resource"
 
@alt(ANY, "GET")
def retrieve_resource(path, method):
    print "retrieve resource"
 
@alt(ANY, "POST")
def update_resource(path, method):
    print "update resource", path
 
@alt(ALL, "PUT")
def create_new_resource(path, method):
    print "create new resource", path
 
@alt(ANY, ANY)
def invalid_request(path, method):
    print "invalid request", path

Notice that the create_new_resource handler is called when the HTTP command is PUT is passed even when the path is the root path “/”. This is caused by the ALL pattern in the first argument. For all other commands a “not a resource” message is printed.

>>> alt.apply("/home", "PUT")
create new resource /home
>>> alt.apply("/", "PUT")
create new resource /
>>> alt.apply("/", "GET")
not a resource
>>> alt.apply("/home", "GET")
retrieve resource /home
>>> alt.apply("/home", "PAUSE")
invalid request PAUSE

Leave a Reply