Tail recursion decorator revisited
A while ago some of us tried to find the best solution for a tail recursion decorator. The story began when Crutcher Dunnavant came up with a surprising decorator that was able to eliminate tail recursion. His solution used stack inspections and could flatten the call stack s.t. the maximum height of the stack became 2. I gave an alternative, faster implementation that omitted stack inspections. This solution was further improved by Michele Simionato and George Sakkis. The story already ends here because the performance penalty of those decorators was still quite considerable. On the samples we used the undecorated recursive function was more than twice as fast as the decorated one. So the decorator added more than 100% overhead.
Today I tried something new. I reimplemented Georges decorator in Cython which is considerably faster than Python and then I compared the performance of undecorated code, code that is decorated with the pure Python decorator and the Cython version of it.
Here you can download the relevant source code for the decorators.
The following implementation shows the Cython decorator defined in tail_recursive.pyx
cdef class tail_recursive: cdef int firstcall cdef int CONTINUE cdef object argskwd cdef object func def __init__(self, func): self.func = func self.firstcall = True self.CONTINUE = id(object()) def __call__(self, *args, **kwd): if self.firstcall: self.firstcall = False try: while True: result = self.func(*args, **kwd) if result == self.CONTINUE: # update arguments args, kwd = self.argskwd else: # last call return result finally: self.firstcall = True else: # return the arguments of the tail call self.argskwd = args, kwd return self.CONTINUE
Here are some performance tests.
As a sample function I used a tail recursive factorial
def factorial(n, acc=1): "calculate a factorial" return (acc if n == 0 else factorial(n-1, n*acc))
The timining function is defined by
import time def mtime(foo): a = time.time() for j in range(10): for i in range(900): foo(i) return time.time()-a
The results I got were:
8.484 -- undecorated 9.405 -- with Cython decorator + 10% 17.93 -- with Python decorator + 111%Next I checked out a pair of mutual recursive functions. Notice that in those pairs only one function may be decorated by tail_recursive.
def even(n): if n == 0: return True else: return odd(n-1) def odd(n): if n == 0: return False else: return even(n-1)
Here are the results:
2.969 -- undecorated 3.312 -- with Cython decorator + 11% 7.437 -- with Python decorator + 150%These are about the same proportions as for factorial example.
My conclusion is that one can expect about 10% performance penalty for Cythons tail_recursive decorator. This is a quite good result and I don’t shy away from recommending it.