Python

Jynx 0.3 – how to fix custom class loaders for use with Jython

Posted in Jynx, Jython on August 12th, 2009 by kay – Be the first to comment

Broken class loaders

Jynx 0.2 contained an ugly workaround for a bug I couldn’t fix for quite a while. The bug can be described as follows: suppose you defined code of a Java class A and compiled it dynamically:

A = JavaCompiler().createClass("A", A_source)

When you attempt to build a subclass

class B(A): pass
a NoClassDefFoundError exception was raised:

Traceback (most recent call last):
  File "C:\lang\Jython\jcompile.py", line 185, in <module>
    class B(A):pass
    at java.lang.ClassLoader.defineClass1(Native Method)
    at java.lang.ClassLoader.defineClass(ClassLoader.java:621)
    at java.lang.ClassLoader.defineClass(ClassLoader.java:466)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
java.lang.NoClassDefFoundError: org/python/core/PyProxy (wrong name: A)

In that case the Jython runtime failed to create a proxy class for B while locating PyProxy which is a Jython core interface. From the traceback it wasn’t clear how to locate the error and I started to debug into Jython from Netbeans.

This is what happened: Jynx defines a ByteClassLoader class which is custom class loader for dynamic compilation of A. When A is loaded with loadClass a findClass method is called to locate A and this method had to be overwritten. The ByteClassLoader was bound to A automatically and used by Jython to locate interfaces such as org.python.core.PyProxy. This didn’t work and explains the failure. A possible fix is to respond to classes which cannot be dealt with from ByteClassLoader and delegate a findClass call to the parent class loader.

Curiously Jython stopped using ByteClassLoader after I changed the inheritance hierarchy from

class ByteClassLoader(ClassLoader):
    def __init__(self, code):
        super(ByteClassLoader, self).__init__(ClassLoader.getClassLoader())
        ...

to

class ByteClassLoader(URLClassLoader):
    def __init__(self, code):
        super(ByteClassLoader, self).__init__([], ClassLoader.getSystemClassLoader())
        ...

The URLClassLoader provides the opportunity to add URLs at runtime and therefore modifying the CLASSPATH dynamically.

No disk dumps in Jynx 0.3

Prior to Jynx 0.3 a workaround has been dumping A to disk and load the class from there. We discussed the subtle nuances of selecting the right class loader and loading A from disk moved the machinery into a correct state. This wasn’t only cumbersome but a hurdle when a programmer intended to work within a Java sandbox. With Jynx 0.3 I feel prepared to explore Java integration with Jynx on GAE-J.

Jynx 0.2 released

Posted in Java, Jynx, Jython on July 27th, 2009 by kay – Be the first to comment

I’ve released Jynx 0.2. Jynx is a Jython package which utilizes dynamic Java compilation from Jython and improves on Java scripting. With Jynx 0.2 two major new features are implemented now.

Annotation Extraction

In the initial Jynx 0.1 release an annotation object was defined which can be used as a decorator. A Python class such as

@JavaClass
class TestClass(Object):
    @annotation("Test")
    @signature("public void _()")
    def test_report_test_failure(self):
        assertTrue("length of empty list is 0", len([]) != 0)

equipped with the JavaClass decorator is compiled into a Java class on the fly which acts as a proxy for a Python object and provides the correct interface for being used within a Java framework which expects methods of a particular type signature and annotations. The class defined above can be used within JUnit 4.X.

Jynx 0.2 provides a new classmethod extract of the annotation class which can be used to extract Java annotation classes and acts as a factory function for Jython annotation objects.

# import Test annotation in JUnit 4.X
from org.junit import Test      
 
# a Python annotation object
Test = annotation.extract(Test) 
 
# keep a signature object as a parameter and returns a new Jython
# annotation object. The Java code generator will create a method
# with the correct signature and the @Test annotation
Test = Test(signature("public void _()")  
 
@JavaClass
class TestClass(Object):
    @Test
    def test_report_test_failure(self):
       assertTrue("length of empty list is 0", len([]) != 0)

As we see there is no overhead left here. When programming against a Java API / framework, Jython annotations can be defined within a single file and used application wide.

Classpath Manipulation

For reasons which are not completely transparent to me Java doesn’t permit runtime classpath manipulations. The JDK defines an addURL method in a special classloader called URLClassLoader. This method is protected and cannot generally be accessed without reflection. Internally the Sun JVM uses such a loader class ( or a subclass of it ) and when you are willing to accept a hack and programming against an implementation detail you can use the JVMs default class loader and add new paths to a classpath:

from java.lang import ClassLoader
systemLoader = ClassLoader.getSystemClassLoader()
systemLoader.addURL("file:///C|junit-4.6.jar")

Jynx defines a ClassPath class and a new sys module attribute classpath. Adding a file system path P to sys.classpath results in a method call

systemloader.addURL(URL("file:"+pathname2url(pth)))

which converts the file system path into a Java URL object and adds it to the classpath. Additionally the same path is added to the PYTHONPATH via sys.path:

sys.classpath.append(r"C:\junit-4.6.jar")

The advantage is that each Python package can maintain the Java packages it depends upon and no global CLASSPATH environment variable has to be adapted unless a Java or Jython class defines its own class loader.

Four things I’d change in Python – and a little more

Posted in Python on July 25th, 2009 by kay – 4 Comments

1. Import system

Replace the flat module cache by a set of ModuleTree objects rooted in nodes living on the PYTHONPATH. Apply relative path semantics by default and treat absolute paths as special cases. Internal paths which are used in import statements or for traversing ModuleTree objects and external ones ( file-system, zip-files, URLs etc. ) are related through representations of internal paths [1]. Representations shall be user definable. For ModuleTree objects custom import semantics may be defined. This replaces “import hooks” and provides similar functionality in a much safer and object oriented manner. Further effects: no physical module is imported twice for two different import paths; each module can be used as a script no matter how the path is written. No changes to the language syntax.

[1] What I mean here is a representation of a path algebra in systems which can be considered as the “environment” of Python. This sounds more grandiose than it actually is.

2. Decorators everywhere

This basically reflects my interest in improving Jython compliance with Java and lifting Jython classes to Java classes turning Java classes into Jython class proxies – everything at runtime. This doesn’t work without specifying Java interfaces in Jython. Those consist of two parts: type signatures + annotations. For functions and classes this works in Python without much hassle. With Python 3.0 style function annotations one can even remove a decorator for type signatures. It doesn’t work for members though. In Java you can write

public class CusomerId {
    @Id
    @Column(name = "CustId", nullable = false)
    private Integer cust_id;
}

In Python I want to write similarly

class CusomerId:
    @Id
    @Column(name = "CustId", nullable = False)
    cust_id = jproperty("private int")

which translates into

class CusomerId:
    cust_id = Id(Column(name="CustId", nullable=False)(jproperty("private int")))

This requires that assignment statements ( grammatically expr_stmt’s ) may be decorated, not just functions and classes.

3. A new opcode for code monitoring

I know Ned Batchelders coverage tool and I have written one by myself using EasyExtend. EasyExtends is more powerful in that it doesn’t only provide the simplest type of coverage namely statement coverage. However it uses source code weaving which might affect functionality in a highly reflective language. It would be far better to introduce a new opcode which is weaved into Pythons bytecode and acts as a sensor. The weaving can be activated using a command line option. The overall achievement is improved code monitoring. This solution might also be applied to improve debuggers by setting breakpoints within expressions.

4. Function annotation and the nonlocal statement backports

I wish to see function argument annotations and the nonlocal statement in Python 2.x.

Other things

Admittedly I felt a little depression after the huge disappointment which was Python 3. Instead of a bright future it looked much like a minor legacy transformation which essentially missed the point of relevant trends in language design which are marked by concurrency orientation and unification of declarative dataflow and OO in frameworks + languages like WPF/Silverlight, Flex and JavaFX. The best thing which can be said about Python 3 is that it didn’t turn into a running gag and actually shipped code.

However there are lots of positive news and much progress in many other areas. At many fronts Python performance is targeted: PyPy, Unladen Swallow, Psyco 2, Cython, Shedskin. Package distribution and deployment is addressed just like renovation of the standard library. With PyPy, Unladen Swallow, Jython and IronPython Python becomes or is already GIL free and fit for multicore. The one improvement I’m personally most pleased about is that of Jython. Aside from my eternal pets ( Trail + EasyExtend ) I enjoy exploring the Javaverse, which is incredibly rich, from the Jython + scripting angle with some promising first results, new challenges and also some disappointments. I actually expect the next 600 Python web frameworks of interest will not be written in CPython anymore but in Jython and IronPython using Java/.Net components. When will we see a Jython Enterprise Framework on the JVM which will be as powerful as Spring but as lightweight as Pylons?

Redesign of the code.py and codeop.py modules

Posted in Python on July 24th, 2009 by kay – 2 Comments

Brett Cannon asks for modules of the stdlib to be redesigned. I find the idea rather bizarre to initiate a poll for this but maybe that’s just the future of programming where the quality of an implementation is judged by democratic voting. So I immersed into the hive mind and voted for distutils. Seems like Tarek Ziade addresses this already but I’m not entirely sure he goes far enough. Last time I looked at the source code there were still all kinds of compiler modules in the lib which contain config information closely coupled with application code. That’s not so nice and mostly a refactoring bit.

Some other of the stdlib modules I’d rewrite are not mentioned in the voting list. Maybe they are not sexy enough for the majority of web programmers that dominate all the discussions about Python? Among my favorites are code.py and codeop.py. Here is a brief but incomplete list of requirements and refactorings.

  • The heuristics used to determine incomplete Python commands in _maybe_compile is pretty weak.
  • Can you tell the difference between Compile, CommandCompiler and compile_command in codeop.py?
  • Encapsulate the raw_input function in interact within a method that can be overwritten.
  • provide two methods at_start and at_exit in InteractiveConsole to make startup and shutdown customizable.
  • Separate interactive loop from line processing and implement the line processor as a generator. It’s easier to write custom interactive loops for systems that interface with Python. The default interact method becomes
    def interact(self):
        self.at_start()
        try:
            gen_process = self.process_line()
            line = None
            while True:
                try:
                    prompt = gen_process.send(line)
                    line   = self.user.get_input(prompt)
                except StopIteration:
                    break
        finally:
            self.at_exit()
  • Move the the line terminating heuristics from _maybe_compile into process_line and define a try_parse function together with a try_compile function. I’d go a little further even and define a try_tokenize function which isn’t essential though.
  • Provide a subclass for interactive sessions which can be recorded and replayed and command line options accordingly. This is optional though and not part of a redesign strategy.

There are other modules I’d like to rewrite such as tokenizer.py. Having a lexer in the stdlib which handles Python as a special case would be quite a big deal IMO. But it’s delicate and I struggle with writing lexers which can be both extended in a simple way ( without the trouble of running into ordered choice problems of the current regular expression engine ) and have a high performance. So far I only accomplished the first of the goals, at least partially, but not the second one.

Jynx

Posted in Jynx, Jython on July 10th, 2009 by kay – Be the first to comment

I have just released the initial version of my new Jython project Jynx on Google Code. Jynx sums up my latest efforts on the dynamic Java compilation front and it heads into the future of Java framework utilization from Python.

Although Jynx is quite tiny right now it has already enough structure, code and documentation for Jython/Jynx developers to be useful. People are invited to check it out, criticize it and contribute.

Have much fun with Python / Jython programming!

Stitches of a flea language – defining Java annotations in Jython

Posted in Java, Jython on June 30th, 2009 by kay – 9 Comments

Jython annotations – anyone?

The last few days I tried to figure out how to create Jython annotations. A Jython annotation is defined here as a Java annotation lifted from Jython to Java. So one essentially defines a Java annotation in a Jython class. A Jython annotation shall not be confused with a decorator. A Python ( or Jython ) decorator is just a higher order function ( or callable ). It might create attributes in Jython objects but those are not the same as Java annotations reflected by the Java runtime. Without Jython annotations Jython is right now essentially stuck in a pre version 1.5 Javaverse and Jython classes are disconnected from modern Java frameworks and cannot be plugged.

Jython annotations in Jython 2.5 don’t work out of the box. It is not much known yet about how or when Jython annotations will be supported by core Jython. The lead Jython developer Frank Wierzbicki announced something along the lines in his 2008 PyCon conference talk but this is now about 16 months ago. I could temper my impatience if Jython annotations were just around the corner but what can we expect after those 16 months?

In this article I introduce a library that enables lifting of meta-data from Jython to Java and loading Java back into Jython. One key element is Java code generation and dynamic compilation using the Java 6 Compilation API. Another one is interface extraction of Jython classes using the rich reflection API provided by core Jython.

Lifting up Jython classes

For every Jython class JyClass one can generate a Java class JaFromJyClass by means of interface extraction. We assume JyClass to be a subclass of a Java class, e.g. Object, and translate the Jython class

class JyClass(Object):
    def foo(self, *args):
        print "foo", args

into a corresponding Java class

public class JaFromJyClass extends Object{
    PyObject jyobject;
 
    public PyObject foo(PyObject[] args)
    {
        return jyobject.invoke("foo", args);
    }
}

This class is basically a proxy for the jyobject member variable of type PyObject which is a Jython API type. Once we have generated the Java code from Jython we can dynamically compile and load the Java code into Jython:

JaFromJyClass = createJavaClass("JaFromJyClass", source)
jainstance = JaFromJyClass()
jainstance.jyobject = JyClass()
jainstance.foo(9)  # prints 'foo 9'

This was straightforward and hints on our translation strategy. Next we review the Jython to Java translations in more detail.

Jython to Java translations

PyObject Injections

We cannot be glad with the way the jyobject was assigned to the jainstance in the previous example. The particular assignment protocol implies that the Jython script has always control over the instantiation of Jython classes. But once we plug the class into a framework the framework takes over. A better solution is to inject the PyObject using a factory mechanism.

public JaFromJyClass() {
    super();
    jyobject = JyGateway.newInstance("JyClass", this, null);
    jaobject = (Object)jyobject.__tojava__(Object.class);
}

The factory is called JyGateway. The JyGateway is a Java class which defines HashMap called registry

public static HashMap<String, PyDictionary> registry = new HashMap<String, PyDictionary>();

The keys of the Java HashMap are Strings that represent class names. The PyDictionary is a dictionary of Jython functions. Right now two functions are defined: newInstance and callStatic. Both of them correspond to static methods of JyGateway. If JyGateway.newInstance(“JyClass”, this, null) is called the newInstance Jython function is fetched from the registry using “JyClass” as a key. The third argument of JyGateway.newInstance contains an array of Java objects passed as arguments to the newInstance function which returns a new PyObject. If the constructor doesn’t take an argument null is passed as in the example. The particular JyClass will never be exposed to Java code.

Overrdiding superclass methods

Aside from jyobject we have also defined jaobject in the JaFromJyClass constructor which has the type Object. Here Object is just the superclass of both JaFromJyClass and JyClass. The jaobjectis defined for the purpose of overriding superclass methods: we cannot simply change the signature of superclass methods in particular not the return value.

If public void foo(int x) is a method defined in the superclass of JyClass, the Java method generated from JyClass is

public void foo(int arg) { jaobject.foo(arg) }

The method foo called from jaobject is still the method implemented in Jython. The Jython method is just called with Java arguments and returns a Java value ( if any ) that gets converted back to Jython.

Calling static methods

Calling static or classmethods of Jython objects from Java is similar to calling JyGateway.newInstance:

public static PyObject bar(PyObject args[])
{
     return JyGateway.callStatic("JyClass", "bar", args);
}

Defining Jython metadata

There are three kinds of meta-data which can be added to Jython classes which are extracted for dynamic Java generation. Those are called jproperty, annotation and signature. They serve different purposes.

signature

A signature decorator is defined to assign Java argument and the return types to a Jython method. Without the signature decorator a default translation is applied:

def foo(self, arg):
    ...

—–>

public PyObject foo(PyObject[] args) {
    return jyobject.invoke("foo", args);
}

If we decorate foo with the following signature decorator

@signature("public int _(char)")
def foo(self, arg):
    ...

we get the translation

public int foo(char arg0){
    PyObject args[] = new PyObject[1];
    for(int i=0;i<1;i++) {
        args[0] = Py.java2py(arg0);
    }
    return (Integer)jyobject.invoke("foo", args).__tojava__(int.class);
}

The name of the function in the signature declaration string is of no relevance. That’s why we have used a single underscore.

annotation

The annotation decorator applies to methods and classes. We have to wait for Jython 2.6 for proper class decorator syntax but the semantics is the same when we write

cls = annotation(value)(cls)

The value passed to annotation is a string which must conform Java annotation syntax with the leading @ character being stripped.

@annotation("Override")
def foo(self, arg):
    ...

is a valid annotation which corresponds to the Java method

@Override
public PyObject foo(PyObject[] args) {
    return jyobject.invoke("foo", args);
}

Annotations can be stacked and also combined with the signature decorator. So we can define three new decorators

test  = annotation("Test")(signature("public void _()"))
setUp = annotation("Before")(signature("public void _()"))
beforeClass = annotation("BeforeClass")(signature("public static void _()"))

and use them within e.g. JUnit 4

from org.junit import*
from org.junit.Assert import*
 
class TestClass(Object):
    @beforeClass
    def start(self):
        print "Run 'TestClass' tests ..."
 
    @test
    def test_epoweripi(self):
        from math import e, pi
        assertTrue( abs( e**(pi*1j) + 1 ) < 10**-10 )

jproperty

The jproperty object is a descriptor which assigns a Java type and zero or more annotations to a Jython attribute.

class JyClass(Object):
    x = jproperty("private int", "X", "Y")

This is how it translates

pubic class JyClassBase(Object)
{
    @Y @X
    private int x;
}

A JyClass instance reads ( and sets ) jproperty values from the corresponding Java class instances it was assigned to. Remember that the jyobject instance construction looked like this

jyobject = JyGateway.newInstance("JyClass", this, null);

With this an instance of the Java class was passed to the newInstance factory function. Not only holds a javaobject a jyobject but also a jyobject holds a javaobject. Reading / writing jproperty values is the primary reason for this cyclic dependence.

Class import heuristics

Whenever a Java class gets compiled, names have to be imported from classes/packages using the import-statement. Jython applies some heuristics to extract class and package names from annotations/Jython code and creates Java import statements accordingly. An important precondition for a class/package to be found is that it has been imported in Jython code. This isn’t particularly cumbersome. When you define an annotation

annotation('SupportedAnnotationTypes("*")')

the name SupportedAnnotationTypes has to be made public using a normal Jython import:

from javax.annotation.processing import SupportedAnnotationTypes

This is not much different from considering an evaluation of the parameter string using Pythons eval.

The annotation class has a class attribute namespace which is used for Jython class and annotation extraction. If the heuristics fails to extract a class the namespace can be manually updated:

annotation.namespace.update(locals())

Jynx

Within the next few days I’ll launch a new project on code.google.com called jynx which will contain tools particularly suited for Jython utilization of modern Java frameworks and APIs. The relevant source code for this article can be found here and you can examine and play with it.

Into The Labyrinth – using the JavaCompiler API from Jython

Posted in Java, Python on June 23rd, 2009 by kay – 1 Comment

The Plumber

After having neglected Java for years I began to re-examine it this month together with Jython and my initial reaction was a culture shock.

Java is infamous for being a “plumbing language” i.e. you have to subclass some classes or implement a few interfaces and then plug them into a framework. Alternatively you have to call framework methods that expect a bunch of interrelated objects as parameters. None of those class implementations is particularly complicated but you have to figure out how all the objects are related to each other and essentially deal with configurations and dependencies on object level. It is easy to mess things up with every additional point of failure. There is also a tendency for abstraction inversion: you have to undertake many concrete steps within a complex machinery to create a simple building block. Abstraction inversion is an indication of a system being overdesigned.

The topic of this article is Javas compiler API and it is a nice show case to highlight some differences which are not really situated on language level but in the way problems are approached. Take Pythons compile function for example. Pythons compile function has the following signature

compile: (code_str, file_name, eval_mode) -> code_object

and is dead simple to use:

>>> compile("import EasyExtend.langlets", "<input>", "exec")
<code object <module> at 00EB0578, file "<input>", line 1>

If you need to read the source from a file you do just this

>>> compile(open("module.py").read(), "input", "exec")
<code object <module> at 00EB0578, file "<input>", line 1>

If you want to store the resulting bytecode in an appropriate file you need to do a little more work

import os, struct, marshal, imp
 
def write_code(filename, code):
    f = open(filename, "wb")
    try:
        mtime = os.path.getmtime(filename)
        mtime = struct.pack('<i', mtime)
        MAGIC = imp.get_magic()
        f.write(MAGIC + mtime)
        marshal.dump(code, f)
    finally:
        f.close()

This creates a Python bytecode file ( usually a ‘ pyc’ file ) and serializes the code object. Something like this could be implemented on the method level of the code object itself and a single call code.write(filename) would suffice to store the code.

How to compile Java dynamically?

Java is different. The Java Compiler API is specified in JSR 199. It contains 20 classes/interfaces and you have to figure out their interplay in order to compile a source string. Half of them are JavaFile or JavaFileManager classes/interfaces so what’s actually peripheral has moved to the center. The JavaCompiler API is new to Java 6 and it seems dynamic compilation didn’t work out of the box prior to release 6. I suppose one had to call javac from the command line or use a different compiler such as Janino.

There isn’t any easy way to get into the compiler API. The JSR 199 isn’t exactly a design document but a terse javadoc API documentation. Tutorials are rare and mostly superficial in that they cover the most simple use case only. A notable exception is an introduction written by Andrew Davison. It covers the most relevant use cases and more. Alternatively one might skim through the tests of the JDK 6. It contains expression evaluation code written by the JSR 199 implementor Peter von der Ahé. Once again tests are documentation. The Jython 2.5 code I’ll present follows Davisons article and is mostly a transcript of his Java code in the relevant parts.

The JavaCompiler API from Jython

Prerequisites: a JDK for Java 6 has to be installed first. The Java compiler tools are implemented in

<JDK-Path>/lib/tools.jar

and that path has to be added to the Java class path when Jython is invoked. Alternatively you can add the path to thePYTHONPATH inside of your application. I’ll chose the latter approach here:

import sys
sys.path.append(os.path.join(os.environ["JAVA_HOME"],"lib", "tools.jar"))

The environment variable JAVA_HOME has to be set to your JDK path – not to the JRE path. This might have to be changed. In case of my Windows notebook the path is

JAVA_HOME = C:\Programme\Java\jdk1.6.0_13.

Another configuration aspect concerns access to protected members of Java classes. By default this is disabled in Jython. We will need to access protected member functions once we override a Java class loader. For that reason one has to set following disrespectful flag in Jythons registry file

python.security.respectJavaAccessibility = false

For more information see the Jython FAQ.

Now we are ready to import the compiler tools:

from javax.tools import*
from com.sun.tools.javac.api import JavacTool
 
compiler = JavacTool.create()
assert compiler

The CompilationTask

After having created a compiler instance we now care about the CompilationTask which is defined in JSR 199. It is basically a callable, an interface which specifies a parameter-less call() method that returns a value of type Boolean. It is this callmethod that has to be overridden in subclasses and called for compilation. In case of the JavaCompiler framework one doesn’t have to override the CompilationTaskexplicitly but fetches it from the compiler object using the the getTask method:

JavaCompiler.CompilationTask getTask(Writer out,
                             JavaFileManager fileManager,
                             DiagnosticListener&lt;? super JavaFileObject&gt; diagnosticListener,
                             Iterable&lt;String&gt; options,
                             Iterable&lt;String&gt; classes,
                             Iterable&lt;? extends JavaFileObject&gt; compilationUnits)

The parameters need further examination but once we have understood how to create the objects required to fetch the CompilationTask the compilation is performed by

compiler.getTask(...).call()

The meaning of the parameters of the CompilationTask is as follows:

Parameters:

  • out – a Writer for additional output from the compiler; use System.err if null
  • fileManager – a file manager; if null use the compiler’s standard filemanager
  • diagnosticListener – a diagnostic listener; if null use the compiler’s default method for reporting diagnostics
  • options – compiler options, null means no options
  • classes – class names (for annotation processing), null means no class names
  • compilationUnits – the compilation units to compile, null means no compilation units

For now we will ignore the out parameter as well as compiler options and class names passed to the annotation processors.

The compilationUnits holds the source code to be compiled. In case of Jython it will be a Python list containing a single JavaFileObject.

So a call of getTask will have the following shape

compiler.getTask(None,
                 fileManager,
                 diagnosticListener,
                 None,
                 None,
                 [source]).call()

Diagnostics

DiagnosticCollectors are used for error reporting. A DiagnosticCollector implements a DiagnosticListener interface. It is parametrized by some type S and holds a possibly empty list of Diagnostic&lt;S&gt; objects which can be fetched.

We only need to know as much about diagnostics to handle failure cases:

 if not task.call():
     msg = "\n  "+"\n  ".join(str(d) for d in diagnostics.getDiagnostics())
     raise JavaCompilationError(msg)

The JavaCompilationError is a custom Jython exception. The JavaCompiler API doesn’t raise an exception on compilation failure.

JavaFileObject

The JavaFileObject specifies a file abstraction. For our own purposes we need two of them: one that is readable and holds the source code ( a string ) and one that is writable and holds the bytecode ( an array of bytes ). Both derive from SimpleJavaFileObject which provides an implementation of the JavaFileObject interface.

from java.net import URI
from java.io import*
 
class StringJFO(SimpleJavaFileObject):
    '''
    JavaFileObject implemention used to hold the source code.
    '''
    def __init__(self, className, codestr):
        self.codestr = codestr
        super(StringJFO, self).__init__(URI(className),
                                        JavaFileObject.Kind.SOURCE)
 
    def getCharContent(self, errs):
        return self.codestr
 
class ByteArrayJFO(SimpleJavaFileObject):
    '''
    JavaFileObject implementation used to hold the byte code.
    '''
    def __init__(self, className, kind):
        super(ByteArrayJFO, self).__init__(URI(className), kind)
        self.baos = ByteArrayOutputStream()
 
    def openInputStream(self):
        return ByteArrayInputStream(self.getByteArray())
 
    def openOutputStream(self):
        return self.baos
 
    def getByteArray(self):
        return self.baos.toByteArray()

In case of the ByteArrayJFO a writable ByteArrayOutputStream is created that can be fetched by the framework using the openInputStream method.

An instance of StringJFO will become our compilationUnit and an instance of ByteArrayJFO will be returned by a still to be defined FileManager:

class ByteJavaFileManager(ForwardingJavaFileManager):
    def __init__(self, fileManager):
        super(ByteJavaFileManager, self).__init__(fileManager)
        self.code = None
 
    def getJavaFileForOutput(self, location, className, kind, sibling):
        self.code = ByteArrayJFO(className, kind)
        return self.code

A Java compiler in Jython

Now we can put all those things together and define a compileClass function.

def compileClass(className, codeStr, *flags):
    compiler = JavacTool.create()
    assert compiler, "Compiler not found"
    diagnostics = DiagnosticCollector()
    jfm = ByteJavaFileManager(compiler.getStandardFileManager(diagnostics,
                                                              None,
                                                              None))
    task = compiler.getTask(None,
                            jfm,
                            diagnostics,
                            flags,
                            None,
                            [StringJFO(className+".java", codeStr)])
    if not task.call():
        e = "\n  "+"\n  ".join(str(d) for d in diagnostics.getDiagnostics())
        raise JavaCompilationError(e)
    return jfm.code

In order to initialize the StringJFO object we have to pass a file name which corresponds to the the class name we use. The return values of compileClass is an object of type ByteArrayJFO that holds the byte code.

Adding a ClassLoader

We are almost done. To complete our exercise we have to make the byte code executable which means we have to define a class loader and create an actual Java class. Since we operate from within Jython it will be immediately wrapped into a Python type.

from java.lang import ClassLoader
from java.lang import ClassNotFoundException
 
class ByteClassLoader(ClassLoader):
    def __init__(self, code):
        super(ByteClassLoader, self).__init__(ClassLoader.getClassLoader())
        self.code = code
 
    def findClass(self, className):
        code = self.code.getByteArray()
        cl = self.defineClass(className, code, 0, len(code))
        if cl is None:
            raise ClassNotFoundException(className)
        else:
            return cl
 
def createJavaClass(className, codeStr, *compilerflags):
    loader = ByteClassLoader(compileClass(className, codeStr, *compilerflags))
    return loader.loadClass(className)

To a very good end we have hidden the JavaCompiler API behind a single function with a tiny interface. There is not something Python has been needed for actually. The JavaCompiler API is a Swiss Army knife style API that requires a whole object tree to be created to achieve a simple effect.

Java classes from within Jython

Finally we want to demonstrate the value of our implementation showing a few simple examples. The jcompile.py module contains the complete code and can be downloaded here.

The first example is our “Hello Jython”:

codeStr = """
public class Foo {
    public static void main(String args[])
    {
        System.out.println("Hello, "+args[0]);
    }
}
"""
 
Foo = createJavaClass("Foo", codeStr)
print Foo  # &lt;type 'Foo'&gt;
 
Foo.main(["Jython!"])  #  Hello, Jython!

In our next example we import some symbols from a Java library:

codeStr = '''
import java.lang.Math;
public class Prime {
    public static Boolean isPrime(double n)
    {
        for(int d=2;d&lt;=Math.sqrt(n);d++)
        {
            if(n%d == 0)
                return false;
        }
        return true;
    }
}
'''
 
Prime = createJavaClass("Prime", codeStr)
Prime.isPrime(2)     # True
Prime.isPrime(9971)  # False
Prime.isPrime(9973)  # True

————— Update ! We. 2009-24-06 —————–

Right now it is not possible to derive a class from a dynamically compiled Java class. If Foo is a dynamically generated Java class and Baris defined as

class Bar(Foo):
    pass

then a ClassNotFoundException is raised. The only possible workaround I see right now is to store the class to the disk and import it right after this. A possible adaption of the createJavaClass function looks like:

def createJavaClass(className,
                    codeStr,
                    todisk = True,
                    remove = True,
                    *compilerflags):
    '''
    Compiles and loads a new Java class.
    '''
    compiled = compileClass(className, codeStr, *compilerflags)
    if todisk:
        code = compiled.getByteArray()
        clsfile = open(className+".class", "wb")
        try:
            code.tofile(clsfile)
        finally:
            clsfile.close()
        jclass = __import__(className)
        if remove:
            os.remove(className+".class")
        return jclass
    else:
        loader = ByteClassLoader(compileClass(className, codeStr, *compilerflags))
        return loader.loadClass(className)

Pattern matching with TupleTrees

Posted in Python on May 14th, 2009 by kay – Be the first to comment

As it seems advanced dispatch schemes are discussed right now under the pattern matching label. In this article I want to discuss another solution using a data-structure called the TupleTree ( also known as prefix tree or trie ). The tuple tree solution comes closer to PEAK rules than to Marius Eriksens pattern matching engine. I find it more elegant than the PEAK rules solution and there is less boilerplate. I can’t say much about the generality of PEAK rules though and they might cover a lot more.

A TupleTree is an efficient way to store tuples by factoring out common prefixes. Suppose you have a set of tuples:

{(a, b, c), (a, b, d), (a, c, d), (b, d, d)} then you can store the same information using a tree structure

{(a, (b, ((c,) (d,))), (c, d)), (b, d, d)}

Searching in the tuple tree is of complexity O(log(n)) and can degenerate to O(n) if there is isn’t much to factorize.

This isn’t too interesting in the discussion of pattern matching schemes if we wouldn’t introduce two different kinds of wildcards or symbolic pattern called ANY and ALL.

ANY – a pattern that matches any symbol but with the lowest priority: if there is a tuple (ANY, Y, …) in the tuple tree then (X, Y, …) is matched by (ANY, Y, …) iff there isn’t a more specific matching tuple (X, Y, … ) in the tree.

ALL – a pattern that matches any symbol but with the same priority as a more specific symbol. If there is a tuple (ALL, Y, …) in the tuple tree then (X, Y, …) is matched by (ALL, Y, …) and by a more specific tuple (X, Y, … ) if present. This means ALL creates an ambiguity.

We can consider ALL as a variable and we eliminate the ambiguity using value substitution. Let’s say we have a set of tuples {(ANY, Y), (X, Y), (ALL, Z)} then elimination of ALL leads to {(ANY, Y), (ANY, Z), (X,Y), (X, Z)} and the tuple tree {(ANY, (Y,), (Z,)), (X, (Y,), (Z,))}.

TupleTree implementation

First we define the mentioned pattern ANY and ALL

class Pattern:
    def __init__(self, name):
        self.name = name
    def __repr__(self):
        return "&lt;P:%s&gt;"%self.name
 
ANY = Pattern("ANY")
ALL = Pattern("ALL")

Now we create the TupleTree object. The TupleTree implements two methods insert and find. The insert method takes a tuple and a key as parameters. It inserts the tuple and stores a key at the location of the tuple. The find method takes a tuple and returns a key if it was inserted at the location of the tuple.

class TupleTree(object):
    def __init__(self):
        self.branches = {}
        self.key = None
        self.all = None
 
    def insert(self, args, value):
        if len(args) == 0:
            self.key = value
            return
        first = args[0]
        if first == ALL:
            for node in self.branches.values():
                node.insert(args[1:], value)
            self.all = (args[1:], value)
        elif first in self.branches:
            node = self.branches[first]
            node.insert(args[1:], value)
            if self.all:
                node.insert(*self.all)
        else:
            tree  = TupleTree()
            self.branches[first] = tree
            tree.insert(args[1:], value)
            if self.all:
                node.insert(*self.all)
 
    def find(self, args):
        first = args[0]
        if first in self.branches:
            node = self.branches[first]
        elif ANY in self.branches:
            node = self.branches[ANY]
        if len(args) == 1:
            return node.key
        else:
            return node.find(args[1:])

The Dispatcher

It is easy to define a dispatcher that matches argument tuples against a tuple in a TupleTree. Handler functions which are decorated by a Dispatcher object are stored as tuple tree keys. Those handler functions are retrieved from the TupleTree when the apply method is called with concrete arguments.

class Dispatcher(object):
    def __init__(self):
        self.ttree = TupleTree()
 
    def __call__(self, *args):
        def handler(f):
            self.ttree.insert(args, f)
            return f
        return handler
 
    def apply(self, *args):
        handler = self.ttree.find(args)
        if not handler:
            raise ValueError("Failed to find handler that matches arguments")
        else:
            return handler(*args)

Example

As an example we create a new Dispatcher object and decorate handler functions using it.

alt = Dispatcher()
 
@alt("/", ANY)
def not_a_resource(path, method):
    print "not a resource"
 
@alt(ANY, "GET")
def retrieve_resource(path, method):
    print "retrieve resource"
 
@alt(ANY, "POST")
def update_resource(path, method):
    print "update resource", path
 
@alt(ALL, "PUT")
def create_new_resource(path, method):
    print "create new resource", path
 
@alt(ANY, ANY)
def invalid_request(path, method):
    print "invalid request", path

Notice that the create_new_resource handler is called when the HTTP command is PUT is passed even when the path is the root path “/”. This is caused by the ALL pattern in the first argument. For all other commands a “not a resource” message is printed.

>>> alt.apply("/home", "PUT")
create new resource /home
>>> alt.apply("/", "PUT")
create new resource /
>>> alt.apply("/", "GET")
not a resource
>>> alt.apply("/home", "GET")
retrieve resource /home
>>> alt.apply("/home", "PAUSE")
invalid request PAUSE

Python vs TTCN-3

Posted in DSL, Python, Testing on April 26th, 2009 by kay – 5 Comments

Some time ago computing scientists Bernard Stepien and Liam Peyton from the University of Ottawa compared Python with TTCN-3. TTCN-3 means Testing and Test Control Notation and it is domain specific language specifically designed for writing tests in the domain of telecommunication. In almost any category poor Python just loses in comparison.

A few things of the presentation are just odd. At several places it contains Python code that is not even consistent. Examples: on slide 42 an “else if” construct is used instead of the correct “elif”. On slide 13 fields are not assigned to self which leads to code that fails to run. But leaving this carelessnes aside there is something important that estranges an experienced Python programmer. Take slide 12 for example. Here the authors state:

TTCN-3 templates to Python
  • TTCN-3 templates could be mapped to Python object instances.
  • However, there are serious limitations using the above technique with Python.
  • Python objects are practically typeless.

Then the presentation goes over 18 slides just to explain how bad plain Python classes are for serving the same purposes as TTCN-3 templates. Although this is correct why should anyone care? If you want to experience an exhilarating effect you have to turn grapes into wine and do not expect the same fun from grape juice. Then you can compare cheap wine with premium one. That’s also why people are used to compare Django to Ruby On Rails and not Django to Ruby or Rails to Python. That’s why libraries and frameworks exist in the first place.

So what about modeling a Template class that has same functionality as a TTCN-3 template? Here is a first attempt:

class Template(object):
    def __init__(self):
        self._frozen = False
        self._fields = []        
 
    def __setattr__(self, name, value):
        if name in ('_fields', '_frozen'):
            object.__setattr__(self, name, value)
            return
        if self._frozen:
            raise RuntimeError("Cannot set attribute value on a frozen template")
        if name not in self._fields:
            self._fields.append(name)
        object.__setattr__(self, name, value)
 
    def __eq__(self, other):
        if len(self._fields)!=len(other._fields):
            return False
        else:
            for f_self, f_other in zip(self._fields, other._fields):
                val_self  = getattr(self, f_self)
                val_other = getattr(self, f_other)
                if val_self != val_other:
                    return False
            return True
 
    def __ne__(self, other):
        return not self.__eq__(other)
 
    def freeze(self):
        self._frozen = True
 
    def clone(self):
        T = Template()
        T._fields = self._fields[:]
        for field in T._fields:
            setattr(T, field, getattr(self, field))
        return T

The Template class manages an internal _fields attribute that contains the attribute names created by __setattr__ dynamically. This is done to preserve the sequential order of the fields on template comparison. In the current implementation there aren’t any fields marked as optional and it will take additional effort to introduce them and modify __eq__.

It is now easy to demonstrate the behavior of wildcards. First we have to introduce another class for this purpose:

class Wildcard(object):
    def __eq__(self, other):
        return True
 
    def __ne__(self, other):
        return False

For any object X which is compared to wildcard it is always X == wildcard == True. So following template instances are equal:

templ_1 = Template()
templ_1.field_1 = wildcard
templ_1.field_2 = "abc"
 
templ_2 = Template()
templ_2.field_1 = "xyz"
templ_2.field_2 = wildcard
 
assert templ_1 == templ_2

A Pattern class can be created in the same way as the Wildcard class:

class Pattern(object):
    def __init__(self, regexp):
        self.pattern = re.compile(regexp)
 
    def __eq__(self, other):
        try:
            return bool(self.pattern.match(other))
        except TypeError:
            return True
 
    def __ne__(self, other):
        return not self.__eq__(other)

An ‘==’ comparison of a Pattern instance with a string will match the string against the pattern and yield True if the string could be matched and False otherwise.

So it is perfectly possible to reproduce most aspects of TTCN-3 templates by just a few lines of Python code. On the downside of TTCN-3 it isn’t possible to do things the other way round. No matter how hard you work in TTCN-3 it is impossible to create new TTCN-3 types showing interesting behavior. Pythons expressiveness is miles ahead.

Tail recursion decorator revisited

Posted in Python on April 20th, 2009 by kay – 6 Comments

A while ago some of us tried to find the best solution for a tail recursion decorator. The story began when Crutcher Dunnavant came up with a surprising decorator that was able to eliminate tail recursion. His solution used stack inspections and could flatten the call stack s.t. the maximum height of the stack became 2. I gave an alternative, faster implementation that omitted stack inspections. This solution was further improved by Michele Simionato and George Sakkis. The story already ends here because the performance penalty of those decorators was still quite considerable. On the samples we used the undecorated recursive function was more than twice as fast as the decorated one. So the decorator added more than 100% overhead.

Today I tried something new. I reimplemented Georges decorator in Cython which is considerably faster than Python and then I compared the performance of undecorated code, code that is decorated with the pure Python decorator and the Cython version of it.

Here you can download the relevant source code for the decorators.

The following implementation shows the Cython decorator defined in tail_recursive.pyx

cdef class tail_recursive:
    cdef int firstcall
    cdef int CONTINUE
    cdef object argskwd
    cdef object func
 
    def __init__(self, func):
        self.func = func
        self.firstcall = True
        self.CONTINUE = id(object())
 
    def __call__(self, *args, **kwd):
        if self.firstcall:
            self.firstcall = False
            try:
                while True:
                    result = self.func(*args, **kwd)
                    if result == self.CONTINUE: # update arguments
                        args, kwd = self.argskwd
                    else: # last call
                        return result
            finally:
                self.firstcall = True
        else: # return the arguments of the tail call
            self.argskwd = args, kwd
            return self.CONTINUE

Here are some performance tests.

As a sample function I used a tail recursive factorial

def factorial(n, acc=1):
    "calculate a factorial"
    return (acc if n == 0 else factorial(n-1, n*acc))

The timining function is defined by

import time
def mtime(foo):
    a = time.time()
    for j in range(10):
        for i in range(900):
            foo(i)
    return time.time()-a

The results I got were:

8.484 -- undecorated
9.405 -- with Cython decorator   + 10%
17.93 -- with Python decorator   + 111%
Next I checked out a pair of mutual recursive functions. Notice that in those pairs only one function may be decorated by tail_recursive.

def even(n):
    if n == 0:
        return True
    else:
        return odd(n-1)
 
def odd(n):
    if n == 0:
        return False
    else:
        return even(n-1)

Here are the results:

2.969 -- undecorated
3.312 -- with Cython decorator   + 11%
7.437 -- with Python decorator   + 150%
These are about the same proportions as for factorial example.

My conclusion is that one can expect about 10% performance penalty for Cythons tail_recursive decorator. This is a quite good result and I don’t shy away from recommending it.