# Pattern matching with TupleTrees

As it seems advanced dispatch schemes are discussed right now under the pattern matching label. In this article I want to discuss another solution using a data-structure called the `TupleTree` ( also known as prefix tree or trie ). The tuple tree solution comes closer to PEAK rules than to Marius Eriksens pattern matching engine. I find it more elegant than the PEAK rules solution and there is less boilerplate. I can’t say much about the generality of PEAK rules though and they might cover a lot more.

A `TupleTree` is an efficient way to store tuples by factoring out common prefixes. Suppose you have a set of tuples:

`{(a, b, c), (a, b, d), (a, c, d), (b, d, d)}` then you can store the same information using a tree structure

`{(a, (b, ((c,) (d,))), (c, d)), (b, d, d)}`

Searching in the tuple tree is of complexity `O(log(n))` and can degenerate to `O(n)` if there is isn’t much to factorize.

This isn’t too interesting in the discussion of pattern matching schemes if we wouldn’t introduce two different kinds of wildcards or symbolic pattern called `ANY` and `ALL`.

ANY – a pattern that matches any symbol but with the lowest priority: if there is a tuple `(ANY, Y, …)` in the tuple tree then `(X, Y, …)` is matched by `(ANY, Y, …)` iff there isn’t a more specific matching tuple `(X, Y, … )` in the tree.

ALL – a pattern that matches any symbol but with the same priority as a more specific symbol. If there is a tuple `(ALL, Y, …)` in the tuple tree then `(X, Y, …)` is matched by `(ALL, Y, …)` and by a more specific tuple `(X, Y, … )` if present. This means `ALL` creates an ambiguity.

We can consider `ALL` as a variable and we eliminate the ambiguity using value substitution. Let’s say we have a set of tuples `{(ANY, Y), (X, Y), (ALL, Z)}` then elimination of `ALL` leads to `{(ANY, Y), (ANY, Z), (X,Y), (X, Z)}` and the tuple tree `{(ANY, (Y,), (Z,)), (X, (Y,), (Z,))}`.

#### TupleTree implementation

First we define the mentioned pattern `ANY` and `ALL`

 class Pattern: def __init__(self, name): self.name = name def __repr__(self): return "<P:%s>"%self.name   ANY = Pattern("ANY") ALL = Pattern("ALL")

Now we create the `TupleTree` object. The `TupleTree` implements two methods `insert` and `find`. The `insert` method takes a tuple and a key as parameters. It inserts the tuple and stores a `key` at the location of the tuple. The `find` method takes a tuple and returns a key if it was inserted at the location of the tuple.

 class TupleTree(object): def __init__(self): self.branches = {} self.key = None self.all = None   def insert(self, args, value): if len(args) == 0: self.key = value return first = args[0] if first == ALL: for node in self.branches.values(): node.insert(args[1:], value) self.all = (args[1:], value) elif first in self.branches: node = self.branches[first] node.insert(args[1:], value) if self.all: node.insert(*self.all) else: tree = TupleTree() self.branches[first] = tree tree.insert(args[1:], value) if self.all: node.insert(*self.all)   def find(self, args): first = args[0] if first in self.branches: node = self.branches[first] elif ANY in self.branches: node = self.branches[ANY] if len(args) == 1: return node.key else: return node.find(args[1:])

#### The Dispatcher

It is easy to define a dispatcher that matches argument tuples against a tuple in a `TupleTree`. Handler functions which are decorated by a `Dispatcher` object are stored as tuple tree keys. Those handler functions are retrieved from the `TupleTree` when the `apply` method is called with concrete arguments.

 class Dispatcher(object): def __init__(self): self.ttree = TupleTree()   def __call__(self, *args): def handler(f): self.ttree.insert(args, f) return f return handler   def apply(self, *args): handler = self.ttree.find(args) if not handler: raise ValueError("Failed to find handler that matches arguments") else: return handler(*args)

#### Example

As an example we create a new `Dispatcher` object and decorate handler functions using it.

 alt = Dispatcher()   @alt("/", ANY) def not_a_resource(path, method): print "not a resource"   @alt(ANY, "GET") def retrieve_resource(path, method): print "retrieve resource"   @alt(ANY, "POST") def update_resource(path, method): print "update resource", path   @alt(ALL, "PUT") def create_new_resource(path, method): print "create new resource", path   @alt(ANY, ANY) def invalid_request(path, method): print "invalid request", path

Notice that the `create_new_resource` handler is called when the HTTP command is `PUT` is passed even when the path is the root path “/”. This is caused by the `ALL` pattern in the first argument. For all other commands a “not a resource” message is printed.

>>> alt.apply("/home", "PUT")
create new resource /home
>>> alt.apply("/", "PUT")
create new resource /
>>> alt.apply("/", "GET")
not a resource
>>> alt.apply("/home", "GET")
retrieve resource /home
>>> alt.apply("/home", "PAUSE")
invalid request PAUSE
This entry was posted in Python. Bookmark the permalink.