Java Spring – or the Biggus Dickus effect

Looking at the API alone Spring feels like reading a parody on Java enterprise software written by Steve Yegge.

AbstractBeanFactoryBasedTargetSourceCreator ContextSingletonBeanFactoryLocator
AspectJAdviceParameterNameDiscoverer UserRoleAuthorizationInterceptor 
TransactionAwarePersistenceManagerFactoryProxy SQLStateSQLExceptionTranslator 
SimpleBeanFactoryAwareAspectInstanceFactory ...

Legend! Nonstop administrative debris as dadaist poetry. Écriture automatique of the programming office manager or his parrot. This goes on and on for miles and miles – I count `1943` Spring 2.5 API classes+interfaces. This way it tops JEE 6 ( Glassfish 3 ) by about 450 – JEE 6 is basically free of this notational nonsense.

Spring causes a Biggus Dickus effect on me in the role of one of Pilatus guarding soldiers. It lives beyond a tolerable threshold of “software engineering best practices” not becoming comical.

Posted in Java | 31 Comments

Jynx

I have just released the initial version of my new Jython project Jynx on Google Code. Jynx sums up my latest efforts on the dynamic Java compilation front and it heads into the future of Java framework utilization from Python.

Although Jynx is quite tiny right now it has already enough structure, code and documentation for Jython/Jynx developers to be useful. People are invited to check it out, criticize it and contribute.

Have much fun with Python / Jython programming!

Posted in Jynx, Jython | Comments Off on Jynx

Stitches of a flea language – defining Java annotations in Jython

Jython annotations – anyone?

The last few days I tried to figure out how to create Jython annotations. A Jython annotation is defined here as a Java annotation lifted from Jython to Java. So one essentially defines a Java annotation in a Jython class. A Jython annotation shall not be confused with a decorator. A Python ( or Jython ) decorator is just a higher order function ( or callable ). It might create attributes in Jython objects but those are not the same as Java annotations reflected by the Java runtime. Without Jython annotations Jython is right now essentially stuck in a pre version 1.5 Javaverse and Jython classes are disconnected from modern Java frameworks and cannot be plugged.

Jython annotations in Jython 2.5 don’t work out of the box. It is not much known yet about how or when Jython annotations will be supported by core Jython. The lead Jython developer Frank Wierzbicki announced something along the lines in his 2008 PyCon conference talk but this is now about 16 months ago. I could temper my impatience if Jython annotations were just around the corner but what can we expect after those 16 months?

In this article I introduce a library that enables lifting of meta-data from Jython to Java and loading Java back into Jython. One key element is Java code generation and dynamic compilation using the Java 6 Compilation API. Another one is interface extraction of Jython classes using the rich reflection API provided by core Jython.

Lifting up Jython classes

For every Jython class `JyClass` one can generate a Java class `JaFromJyClass` by means of interface extraction. We assume `JyClass` to be a subclass of a Java class, e.g. `Object`, and translate the Jython class

class JyClass(Object):
    def foo(self, *args):
        print "foo", args

into a corresponding Java class

public class JaFromJyClass extends Object{
    PyObject jyobject;
 
    public PyObject foo(PyObject[] args)
    {
        return jyobject.invoke("foo", args);
    }
}

This class is basically a proxy for the `jyobject` member variable of type PyObject which is a Jython API type. Once we have generated the Java code from Jython we can dynamically compile and load the Java code into Jython:

JaFromJyClass = createJavaClass("JaFromJyClass", source)
jainstance = JaFromJyClass()
jainstance.jyobject = JyClass()
jainstance.foo(9)  # prints 'foo 9'

This was straightforward and hints on our translation strategy. Next we review the Jython to Java translations in more detail.

Jython to Java translations

PyObject Injections

We cannot be glad with the way the `jyobject` was assigned to the `jainstance` in the previous example. The particular assignment protocol implies that the Jython script has always control over the instantiation of Jython classes. But once we plug the class into a framework the framework takes over. A better solution is to inject the PyObject using a factory mechanism.

public JaFromJyClass() {
    super();
    jyobject = JyGateway.newInstance("JyClass", this, null);
    jaobject = (Object)jyobject.__tojava__(Object.class);
}

The factory is called `JyGateway`. The `JyGateway` is a Java class which defines HashMap called registry

public static HashMap<String, PyDictionary> registry = new HashMap<String, PyDictionary>();

The keys of the Java HashMap are Strings that represent class names. The `PyDictionary` is a dictionary of Jython functions. Right now two functions are defined: `newInstance` and `callStatic`. Both of them correspond to static methods of `JyGateway`. If `JyGateway.newInstance(“JyClass”, this, null)` is called the `newInstance` Jython function is fetched from the registry using “JyClass” as a key. The third argument of `JyGateway.newInstance` contains an array of Java objects passed as arguments to the `newInstance` function which returns a new PyObject. If the constructor doesn’t take an argument `null` is passed as in the example. The particular `JyClass` will never be exposed to Java code.

Overrdiding superclass methods

Aside from `jyobject` we have also defined `jaobject` in the `JaFromJyClass` constructor which has the type `Object`. Here `Object` is just the superclass of both `JaFromJyClass` and `JyClass`. The `jaobject`is defined for the purpose of overriding superclass methods: we cannot simply change the signature of superclass methods in particular not the return value.

If `public void foo(int x)` is a method defined in the superclass of `JyClass`, the Java method generated from `JyClass` is

public void foo(int arg) { jaobject.foo(arg) }

The method `foo` called from `jaobject` is still the method implemented in Jython. The Jython method is just called with Java arguments and returns a Java value ( if any ) that gets converted back to Jython.

Calling static methods

Calling static or classmethods of Jython objects from Java is similar to calling `JyGateway.newInstance`:

public static PyObject bar(PyObject args[])
{
     return JyGateway.callStatic("JyClass", "bar", args);
}

Defining Jython metadata

There are three kinds of meta-data which can be added to Jython classes which are extracted for dynamic Java generation. Those are called `jproperty`, `annotation` and `signature`. They serve different purposes.

signature

A `signature` decorator is defined to assign Java argument and the return types to a Jython method. Without the signature decorator a default translation is applied:

def foo(self, arg):
    ...

—–>

public PyObject foo(PyObject[] args) {
    return jyobject.invoke("foo", args);
}

If we decorate `foo` with the following `signature` decorator

@signature("public int _(char)")
def foo(self, arg):
    ...

we get the translation

public int foo(char arg0){
    PyObject args[] = new PyObject[1];
    for(int i=0;i<1;i++) {
        args[0] = Py.java2py(arg0);
    }
    return (Integer)jyobject.invoke("foo", args).__tojava__(int.class);
}

The name of the function in the signature declaration string is of no relevance. That’s why we have used a single underscore.

annotation

The `annotation` decorator applies to methods and classes. We have to wait for Jython 2.6 for proper class decorator syntax but the semantics is the same when we write

cls = annotation(value)(cls)

The value passed to `annotation` is a string which must conform Java annotation syntax with the leading `@` character being stripped.

@annotation("Override")
def foo(self, arg):
    ...

is a valid annotation which corresponds to the Java method

@Override
public PyObject foo(PyObject[] args) {
    return jyobject.invoke("foo", args);
}

Annotations can be stacked and also combined with the signature decorator. So we can define three new decorators

test  = annotation("Test")(signature("public void _()"))
setUp = annotation("Before")(signature("public void _()"))
beforeClass = annotation("BeforeClass")(signature("public static void _()"))

and use them within e.g. JUnit 4

from org.junit import*
from org.junit.Assert import*
 
class TestClass(Object):
    @beforeClass
    def start(self):
        print "Run 'TestClass' tests ..."
 
    @test
    def test_epoweripi(self):
        from math import e, pi
        assertTrue( abs( e**(pi*1j) + 1 ) < 10**-10 )

jproperty

The `jproperty` object is a descriptor which assigns a Java type and zero or more annotations to a Jython attribute.

class JyClass(Object):
    x = jproperty("private int", "X", "Y")

This is how it translates

pubic class JyClassBase(Object)
{
    @Y @X
    private int x;
}

A `JyClass` instance reads ( and sets ) `jproperty` values from the corresponding Java class instances it was assigned to. Remember that the `jyobject` instance construction looked like this

jyobject = JyGateway.newInstance("JyClass", this, null);

With `this` an instance of the Java class was passed to the `newInstance` factory function. Not only holds a `javaobject` a `jyobject` but also a `jyobject` holds a `javaobject`. Reading / writing `jproperty` values is the primary reason for this cyclic dependence.

Class import heuristics

Whenever a Java class gets compiled, names have to be imported from classes/packages using the import-statement. Jython applies some heuristics to extract class and package names from annotations/Jython code and creates Java import statements accordingly. An important precondition for a class/package to be found is that it has been imported in Jython code. This isn’t particularly cumbersome. When you define an annotation

annotation('SupportedAnnotationTypes("*")')

the name `SupportedAnnotationTypes` has to be made public using a normal Jython import:

from javax.annotation.processing import SupportedAnnotationTypes

This is not much different from considering an evaluation of the parameter string using Pythons `eval`.

The `annotation` class has a class attribute `namespace` which is used for Jython class and annotation extraction. If the heuristics fails to extract a class the `namespace` can be manually updated:

annotation.namespace.update(locals())

Jynx

Within the next few days I’ll launch a new project on `code.google.com` called jynx which will contain tools particularly suited for Jython utilization of modern Java frameworks and APIs. The relevant source code for this article can be found here and you can examine and play with it.

Posted in Java, Jython | 9 Comments

Into The Labyrinth – using the JavaCompiler API from Jython

The Plumber

After having neglected Java for years I began to re-examine it this month together with Jython and my initial reaction was a culture shock.

Java is infamous for being a “plumbing language” i.e. you have to subclass some classes or implement a few interfaces and then plug them into a framework. Alternatively you have to call framework methods that expect a bunch of interrelated objects as parameters. None of those class implementations is particularly complicated but you have to figure out how all the objects are related to each other and essentially deal with configurations and dependencies on object level. It is easy to mess things up with every additional point of failure. There is also a tendency for abstraction inversion: you have to undertake many concrete steps within a complex machinery to create a simple building block. Abstraction inversion is an indication of a system being overdesigned.

The topic of this article is Javas compiler API and it is a nice show case to highlight some differences which are not really situated on language level but in the way problems are approached. Take Pythons `compile` function for example. Pythons `compile` function has the following signature

`compile: (code_str, file_name, eval_mode) ` -> `code_object`

and is dead simple to use:

>>> compile("import EasyExtend.langlets", "<input>", "exec")
<code object <module> at 00EB0578, file "<input>", line 1>

If you need to read the source from a file you do just this

>>> compile(open("module.py").read(), "input", "exec")
<code object <module> at 00EB0578, file "<input>", line 1>

If you want to store the resulting bytecode in an appropriate file you need to do a little more work

import os, struct, marshal, imp
 
def write_code(filename, code):
    f = open(filename, "wb")
    try:
        mtime = os.path.getmtime(filename)
        mtime = struct.pack('<i', mtime)
        MAGIC = imp.get_magic()
        f.write(MAGIC + mtime)
        marshal.dump(code, f)
    finally:
        f.close()

This creates a Python bytecode file ( usually a ‘ pyc’ file ) and serializes the code object. Something like this could be implemented on the method level of the `code` object itself and a single call `code.write(filename)` would suffice to store the code.

How to compile Java dynamically?

Java is different. The Java Compiler API is specified in JSR 199. It contains 20 classes/interfaces and you have to figure out their interplay in order to compile a source string. Half of them are `JavaFile` or `JavaFileManager` classes/interfaces so what’s actually peripheral has moved to the center. The JavaCompiler API is new to Java 6 and it seems dynamic compilation didn’t work out of the box prior to release 6. I suppose one had to call `javac` from the command line or use a different compiler such as Janino.

There isn’t any easy way to get into the compiler API. The JSR 199 isn’t exactly a design document but a terse javadoc API documentation. Tutorials are rare and mostly superficial in that they cover the most simple use case only. A notable exception is an introduction written by Andrew Davison. It covers the most relevant use cases and more. Alternatively one might skim through the tests of the JDK 6. It contains expression evaluation code written by the JSR 199 implementor Peter von der Ahé. Once again tests are documentation. The Jython 2.5 code I’ll present follows Davisons article and is mostly a transcript of his Java code in the relevant parts.

The JavaCompiler API from Jython

Prerequisites: a JDK for Java 6 has to be installed first. The Java compiler tools are implemented in

<`JDK-Path`>`/lib/tools.jar`

and that path has to be added to the Java class path when Jython is invoked. Alternatively you can add the path to the` PYTHONPATH` inside of your application. I’ll chose the latter approach here:

import sys
sys.path.append(os.path.join(os.environ["JAVA_HOME"],"lib", "tools.jar"))

The environment variable `JAVA_HOME` has to be set to your JDK path – not to the JRE path. This might have to be changed. In case of my Windows notebook the path is

`JAVA_HOME = C:\Programme\Java\jdk1.6.0_13`.

Another configuration aspect concerns access to protected members of Java classes. By default this is disabled in Jython. We will need to access protected member functions once we override a Java class loader. For that reason one has to set following disrespectful flag in Jythons `registry` file

python.security.respectJavaAccessibility = false

For more information see the Jython FAQ.

Now we are ready to import the compiler tools:

from javax.tools import*
from com.sun.tools.javac.api import JavacTool
 
compiler = JavacTool.create()
assert compiler

The CompilationTask

After having created a compiler instance we now care about the `CompilationTask` which is defined in JSR 199. It is basically a `callable`, an interface which specifies a parameter-less `call()` method that returns a value of type `Boolean`. It is this `call`method that has to be overridden in subclasses and called for compilation. In case of the JavaCompiler framework one doesn’t have to override the `CompilationTask`explicitly but fetches it from the `compiler` object using the the `getTask` method:

JavaCompiler.CompilationTask getTask(Writer out,
                             JavaFileManager fileManager,
                             DiagnosticListener&lt;? super JavaFileObject&gt; diagnosticListener,
                             Iterable&lt;String&gt; options,
                             Iterable&lt;String&gt; classes,
                             Iterable&lt;? extends JavaFileObject&gt; compilationUnits)

The parameters need further examination but once we have understood how to create the objects required to fetch the `CompilationTask` the compilation is performed by

compiler.getTask(...).call()

The meaning of the parameters of the `CompilationTask` is as follows:

Parameters:

  • out – a Writer for additional output from the compiler; use System.err if null
  • fileManager – a file manager; if null use the compiler’s standard filemanager
  • diagnosticListener – a diagnostic listener; if null use the compiler’s default method for reporting diagnostics
  • options – compiler options, null means no options
  • classes – class names (for annotation processing), null means no class names
  • compilationUnits – the compilation units to compile, null means no compilation units

For now we will ignore the `out` parameter as well as compiler options and class names passed to the annotation processors.

The `compilationUnits` holds the source code to be compiled. In case of Jython it will be a Python list containing a single `JavaFileObject`.

So a call of `getTask` will have the following shape

compiler.getTask(None,
                 fileManager,
                 diagnosticListener,
                 None,
                 None,
                 [source]).call()

Diagnostics

`DiagnosticCollectors` are used for error reporting. A `DiagnosticCollector` implements a `DiagnosticListener` interface. It is parametrized by some type `S` and holds a possibly empty list of `Diagnostic<S>` objects which can be fetched.

We only need to know as much about diagnostics to handle failure cases:

 if not task.call():
     msg = "\n  "+"\n  ".join(str(d) for d in diagnostics.getDiagnostics())
     raise JavaCompilationError(msg)

The `JavaCompilationError` is a custom Jython exception. The JavaCompiler API doesn’t raise an exception on compilation failure.

JavaFileObject

The `JavaFileObject` specifies a file abstraction. For our own purposes we need two of them: one that is readable and holds the source code ( a string ) and one that is writable and holds the bytecode ( an array of bytes ). Both derive from `SimpleJavaFileObject` which provides an implementation of the `JavaFileObject` interface.

from java.net import URI
from java.io import*
 
class StringJFO(SimpleJavaFileObject):
    '''
    JavaFileObject implemention used to hold the source code.
    '''
    def __init__(self, className, codestr):
        self.codestr = codestr
        super(StringJFO, self).__init__(URI(className),
                                        JavaFileObject.Kind.SOURCE)
 
    def getCharContent(self, errs):
        return self.codestr
 
class ByteArrayJFO(SimpleJavaFileObject):
    '''
    JavaFileObject implementation used to hold the byte code.
    '''
    def __init__(self, className, kind):
        super(ByteArrayJFO, self).__init__(URI(className), kind)
        self.baos = ByteArrayOutputStream()
 
    def openInputStream(self):
        return ByteArrayInputStream(self.getByteArray())
 
    def openOutputStream(self):
        return self.baos
 
    def getByteArray(self):
        return self.baos.toByteArray()

In case of the `ByteArrayJFO` a writable `ByteArrayOutputStream` is created that can be fetched by the framework using the `openInputStream` method.

An instance of `StringJFO` will become our `compilationUnit` and an instance of `ByteArrayJFO` will be returned by a still to be defined `FileManager`:

class ByteJavaFileManager(ForwardingJavaFileManager):
    def __init__(self, fileManager):
        super(ByteJavaFileManager, self).__init__(fileManager)
        self.code = None
 
    def getJavaFileForOutput(self, location, className, kind, sibling):
        self.code = ByteArrayJFO(className, kind)
        return self.code

A Java compiler in Jython

Now we can put all those things together and define a `compileClass` function.

def compileClass(className, codeStr, *flags):
    compiler = JavacTool.create()
    assert compiler, "Compiler not found"
    diagnostics = DiagnosticCollector()
    jfm = ByteJavaFileManager(compiler.getStandardFileManager(diagnostics,
                                                              None,
                                                              None))
    task = compiler.getTask(None,
                            jfm,
                            diagnostics,
                            flags,
                            None,
                            [StringJFO(className+".java", codeStr)])
    if not task.call():
        e = "\n  "+"\n  ".join(str(d) for d in diagnostics.getDiagnostics())
        raise JavaCompilationError(e)
    return jfm.code

In order to initialize the `StringJFO` object we have to pass a file name which corresponds to the the class name we use. The return values of `compileClass` is an object of type `ByteArrayJFO` that holds the byte code.

Adding a ClassLoader

We are almost done. To complete our exercise we have to make the byte code executable which means we have to define a class loader and create an actual Java class. Since we operate from within Jython it will be immediately wrapped into a Python type.

from java.lang import ClassLoader
from java.lang import ClassNotFoundException
 
class ByteClassLoader(ClassLoader):
    def __init__(self, code):
        super(ByteClassLoader, self).__init__(ClassLoader.getClassLoader())
        self.code = code
 
    def findClass(self, className):
        code = self.code.getByteArray()
        cl = self.defineClass(className, code, 0, len(code))
        if cl is None:
            raise ClassNotFoundException(className)
        else:
            return cl
 
def createJavaClass(className, codeStr, *compilerflags):
    loader = ByteClassLoader(compileClass(className, codeStr, *compilerflags))
    return loader.loadClass(className)

To a very good end we have hidden the JavaCompiler API behind a single function with a tiny interface. There is not something Python has been needed for actually. The JavaCompiler API is a Swiss Army knife style API that requires a whole object tree to be created to achieve a simple effect.

Java classes from within Jython

Finally we want to demonstrate the value of our implementation showing a few simple examples. The `jcompile.py` module contains the complete code and can be downloaded here.

The first example is our “Hello Jython”:

codeStr = """
public class Foo {
    public static void main(String args[])
    {
        System.out.println("Hello, "+args[0]);
    }
}
"""
 
Foo = createJavaClass("Foo", codeStr)
print Foo  # &lt;type 'Foo'&gt;
 
Foo.main(["Jython!"])  #  Hello, Jython!

In our next example we import some symbols from a Java library:

codeStr = '''
import java.lang.Math;
public class Prime {
    public static Boolean isPrime(double n)
    {
        for(int d=2;d&lt;=Math.sqrt(n);d++)
        {
            if(n%d == 0)
                return false;
        }
        return true;
    }
}
'''
 
Prime = createJavaClass("Prime", codeStr)
Prime.isPrime(2)     # True
Prime.isPrime(9971)  # False
Prime.isPrime(9973)  # True

————— Update ! We. 2009-24-06 —————–

Right now it is not possible to derive a class from a dynamically compiled Java class. If `Foo` is a dynamically generated Java class and `Bar`is defined as

class Bar(Foo):
    pass

then a `ClassNotFoundException` is raised. The only possible workaround I see right now is to store the class to the disk and import it right after this. A possible adaption of the `createJavaClass` function looks like:

def createJavaClass(className,
                    codeStr,
                    todisk = True,
                    remove = True,
                    *compilerflags):
    '''
    Compiles and loads a new Java class.
    '''
    compiled = compileClass(className, codeStr, *compilerflags)
    if todisk:
        code = compiled.getByteArray()
        clsfile = open(className+".class", "wb")
        try:
            code.tofile(clsfile)
        finally:
            clsfile.close()
        jclass = __import__(className)
        if remove:
            os.remove(className+".class")
        return jclass
    else:
        loader = ByteClassLoader(compileClass(className, codeStr, *compilerflags))
        return loader.loadClass(className)
Posted in Java, Python | 1 Comment

Linear bounds for backtracking parsers with memoization

On the comp.compilers mailing list I asked for a confirmation of the O(n) complexity claim for packrat parsers. A packrat parser is a particular top down recursive descendant parser which applies backtracking on failure of consuming a token by application of a parsing expression. Now a backtracking parser usually exhibits O(2n) complexity and the essential claim is that memoization brings it down to O(n) on the cost of expanding storage. The work of Bryan Ford about packrat parsers refers to an original article written by Birman & Ullmann that gave a proof. The cited article costs 19$ at IEEE which I felt I would spent when I failed to give a proof by myself. Fortunately the proof is elementary and not complicated. So I could save my money.

Here we go.

Each non-terminal `N` of a grammar G has a finite first set `{T1, .., Tn}` which are terminal symbols that can be recursively determined by examining the start of the rule `N`. So if `N` has the shape

`N → R1 … | R2 …`

then

`FirstSet(N)` = FirstSet(R1) ⋃ FirstSet(R2) …

with `FirstSet(`Ri ) = { Ri } if Ri is a terminal symbol.

Next we consider the preimages of terminals with regard to FirstSets. We call them ancestors and for a terminal T the set of ancestors is defined by:

`Ancestors(T) = {T} ∪ {N ∈ NonTerminals(G)| T ∈ FirstSet(N)}`

Each terminal T has at least one ancestor ( itself ) and at most `K = #NT(G)+1` where `#NT(G)` is the number of non-terminals of the grammar G.

Suppose now we feed a token stream T1 T2 … Tn into a top down parser. For each Ti in the stream we can apply all of the rules in `Ancestors(`Ti`)` and store the results in a set

`Trees(i) = {(i, Tree(R))| R ∈ Ancestors(`Ti`) }`

Since for all i the inequality`#Ancestors(`Ti`)` ≤ K holds the set

`Trees =` i∈{1, …, n}`Trees(i)`

has at most `K*n` elements. If a new tree needs to be computed the sub-trees may be looked up in `Trees`. If one is missing that one is computed as well etc. In any case no more than `K*n` computations need to be performed to determine all elements of `Trees`. Since K was a constant that depended only on the grammar we can conclude that the asymptotic complexity of computing `Trees` is O(n). Since `Trees` contains a solution to our problem it can be computed in O(n) as well.

Posted in Parsing | 2 Comments

Is parsing Perl really impossible?

I just read this article about the apparent inability to parse Perl 5.

There is an underlying assumption that a parser has to resolve all ambiguities and derive a single parse tree from an expression ( giving a unique interpretation ). But instead of demanding the value of an expression in order to parse it the parser might derive multiple parse trees and insert them as child nodes into a new parse tree representing a conditional expression. So the parser can just inflate the parse tree structure and do its job.

Whether or not the conditional expression can be computed at runtime doesn’t have to bother the compiler developer. It is just another function call and no more dubious then an arbitrary function call in a conditional expression

if(foo())
    printf("foo() is true\n");
else
    printf("foo() is false\n");

So it makes sense to fix requirements to a parser before giving proofs of their impossibility.

Posted in Grammars, Parsing | 5 Comments

Jython – 64K ought to be enough for anybody

Jython 2.5rc4 (Release_2_5rc4:6470, Jun 8 2009, 13:23:16)
[Java HotSpot(TM) Client VM (Sun Microsystems Inc.)] on java1.6.0_13
Type "help", "copyright", "credits" or "license" for more information.
&gt;&gt;&gt; L = range(10000)
&gt;&gt;&gt; eval(str(L))
Traceback (most recent call last):
  File "", line 1, in
java.lang.ClassFormatError: Invalid method Code length 99894 in class file org/python/pycode/_pyx9
        at java.lang.ClassLoader.defineClass1(Native Method)
        at java.lang.ClassLoader.defineClass(Unknown Source)
        at org.python.core.BytecodeLoader$Loader.loadClassFromBytes(BytecodeLoader.java:116)
        at org.python.core.BytecodeLoader.makeClass(BytecodeLoader.java:35)
        at org.python.core.BytecodeLoader.makeCode(BytecodeLoader.java:65)
        ...
        at org.python.util.InteractiveConsole.interact(InteractiveConsole.java:90)
        at org.python.util.jython.run(jython.java:293)
        at org.python.util.jython.main(jython.java:113)
 
java.lang.ClassFormatError: java.lang.ClassFormatError: Invalid method Code length 99894 in class file org/python/pycode/_pyx9
&gt;&gt;&gt;

64K ought to be enough for anybody.

irb(main):001:0&gt; L = (1...10000).to_a ;nil
=&gt; nil
irb(main):002:0&gt; SL = eval "["+L.join(",")+"]"; nil
=&gt; nil
irb(main):003:0&gt; SL[10..15]
=&gt; [11, 12, 13, 14, 15, 16]
irb(main):004:0&gt;

Just to get this straight. It is argued for almost decade that Python cannot have some advanced features ( the whole FP/Stackless story ) because this would break Jython and Jython is implemented s.t. `eval` is broken and Jython users cannot even import Python data-structures like generated dicts or lists that exceed 64KB. What about backporting this behavior to CPython in order to preserve conformance between implementations?

Note that the Ruby example is JRuby – not that anyone blames the JVM. Ruby is morally inferior though and wildly coerces types on its own behalf but this is another story.

Posted in General | 9 Comments

CodeTemplates – a fresh look on code transformations

This is the first part of a two part article series about CodeTemplates. CodeTemplates are a code transformation technique that suggests an alternative to common syntactic as well as lexical macros for whole language transformations.

Syntactic macros

About three years ago I experimented with a “syntactic” macro system for Python incorporated in EasyExtend 2. In the meantime another project emerged called MetaPython striving for a similar goal under the sign of the Ouroboros.

Fundamentally a macro system defines functions called macros that act as compile time code generators. It is not surprising that code is data – how else shall a compiler work? – however few languages specify functions that treat source code as source code and operators that evaluate source code at compile time. By quasi-quotation some piece of code in a macro body is mentioned rather than evaluated. Complementary to quasi-quoting there is a splice-operator that evaluates code and returns other code preferably in AST form that becomes aligned with the AST of the quasi-quoted code.

The difference between source code and ASTs is going to vanish in so called homoiconic languages like Lisp where the source code that is written by the user is also a data-structure of the language. So a macro is essentially a function that transforms those data-structures meant to be code. It doesn’t mean though that homoiconicity is prerequisite for syntactic macros. The cited MetaPython but also MetaLua are examples for macro systems in non-homoiconic languages. A nice treatment of the foundations of macro systems in non-homoiconic languages can also be found in the documentation of the Converge language.

Syntactic macros in EE 2

EasyExtend 2.0 defined a macro system in an own langlet, simply called the macro langlet. The role of macros in EasyExtend is somewhat peculiar though because EE is a system used for whole language transformations with language syntax specified in grammars. EE binds node handlers to parse tree node types and those node handlers emit parse tree nodes of the target language. They were just meant to facilitate code transformations within the framework.

Unfortunately it turned out that they didn’t. A bug in a macro caused hard to detect language transformation errors. Error recovery is a major concern if not the single major concern with language transformers and the complicated quasi-quoting / splicing semantics of macros made this even harder. More often than I ever wanted I debugged through framework when attempting to detect a bug in the macro. Furthermore the macro langlet operated within the system and was transformed as well. So two languages were transformed at the same time whereas one of them was additionally engaged in transforming the other. EasyExtend 2 made this possible without any conflicts implementing a joint transformation machinery and separated ranges of node types node handlers could bind to. The macro transformation was implemented as a set of node handlers in the `LangletTransformer` of the macro langlet and it was an ugly beast of code. I intended to rewrite this for EE 3 but it at some point of time I decided to terminate this intellectual exercise and dropped the macro facility completely.

Why not just textual macros?

A textual substitution macro occasionally also called “lexical” macro is a simple way to do code transformations. The most important programming language ever, C, implements them by means of the C preprocessor. Lexical macros are also known as poor-mans macros and using them in EasyExtend was like leaving civilization because city-life is complicated and corrupt and one better strives for a subcomplex life as a farmer or hunter-gatherer. Furthermore EE is in deep love with syntax so a radical conceptualization of the macro idea is required. Fortunately the intellectual effort for this is low and can be summarized in one sentence

Enable arbitrary string transformations guarded by syntax

This is universally applicable for all context free languages and doesn’t require any additional macro syntax.

If one attempts to transform one string `S1` into another one `S2` all that is needed are the following three operators

  • insert
  • delete
  • subst

The edit distance between S1 and S2 is the minimal number of operations of this kind used to transform `S1` into `S2`. It can be computed by the popular algorithm of Levenshtein.

EasyExtend 4 will use a slightly different set of operators

  • replicate
  • delete
  • subst

It is almost equally expressive except for the case of an empty string `S1` that requires an initial `insert` and there is nothing yet to replicate, delete or substitute. But once you have a single character in a string, one can replicate this character and apply substitutions which yields the same as applying multiple insertions.

CodeTemplate objects in EE 4

EasyExtend 4 defines a module called `csttemplate.py` that defines 4 types of objects called `CodeTemplate`, `CodeMarker`, `CodeSelection` and `SelectionTree`.

A CodeTemplate is an object that wraps some piece of source code in some language. The source code is parsed on construction of the CodeTemplate.

A CodeMarker marks sections in the source by means of search pattern. One can bind arbitrary many CodeMarker objects to a CodeTemplate. It is possible to also bind CodeMarkers to other CodeMarkers building a CodeMarker tree. When a CodeMarker B is bound to another `CodeMarker` A the search pattern of B is only applied to the code marked by A.

A CodeTemplate implements a `match`method. If it is applied the pattern specified by all bound CodeMarker objects are matched against the code of the CodeTemplate. The result is a SelectionTree object. The tree holds a list of CodeSelection objects as well as a list of subtrees.

All transformations of the source code wrapped by the CodeTemplate are applied through CodeSelection objects. It is the CodeSelection object that substitutes, replicates or deletes code that is bound to. Note that internally all those operations still happen on a parse tree. Invalid operations will immediately be rejected and an exceptions is raised. It is not a complete syntax check on the whole code that is applied on each operation and it wouldn’t make sense to perform one either because some piece of code can morphed into code of another language and it will be a hybrid piece of code except at transformations start and the endpoint.

The SelectionTree is clonable and can be reused for other substitutions.

Outlook

In the next article I’ll show CodeTemplates in action.

Posted in DSL, Grammars, TBP | Comments Off on CodeTemplates – a fresh look on code transformations

Singular irregularities

An irregularity is a shape or rule violation. It supposes that objects are built according to rules but there are exceptional cases that don’t really fit or do at least violate our expectations of the building law. An irregularity is singular if the building law is violated in a single case.

The best known example in Python is the syntax of tuples:

()                          # empty tuple
(arg)                       # parenthesized expression - no tuple!
(arg,)                      # tuple with one element
(arg1, arg2, ...)           # tuple with many arguments

Another more recent example is the set syntax in Python 3 which requires an ambiguity resolution involving an empty set and an empty dict:

{}                          # empty dict!
set()                       # empty set !
{arg}                       # set with one element
{arg1, arg2, ...}           # set with many elements

Some people suggested to drop parentheses around arguments in function calls. This leads to an ambiguity between a function object and function call without arguments. In order to resolve it one might introduce another singular irregularity:

func                      # object !
func arg1 arg2...         # function call with arguments
func()                    # function call
func(arg1, arg2, ...)     # function call with arguments

Are singular irregularities design mistakes or are they acceptable trade offs for avoiding verbosity and/or the introduction of even more syntax?

Posted in General | 3 Comments

VHDL grammars and the parser sandwich

Eli Bendersky has mentioned some problems of parsing VHDL. I was quite pleased though finding the Postlexer idea being discussed in the following paper which provides a more thorough treatment of VHDL parsing issues.

They call it “Auxiliary Terminal Generator” which is a more precise notation of what I call a “Postlexer”. In that case a Postlexer resolves ambiguities caused by context sensitivity. The solution is to introduce auxiliary terminal symbols in between two context free passes – the one for lexical analysis and parsing.

What I distaste about their approach is that they coupled the grammar with the Postlexer using parsing actions.
Apparently generations of computing scientists enjoyed squeezing as much information as possible into their grammars instead of acknowledging the obvious: a separate parsing phase is often required for non-trivial grammars and since the problems are specific one can leave it entirely to a powerful general purpose language and the programmer to solve them. As a consequence the parser isn’t portable across languages but at least the grammars are.

Generally speaking one has to consider three functions now instead of two:

The Parser Sandwich

Lexer: Parsetable String -&gt; Tokenstream
Postlexer: Tokenstream -&gt; Tokenstream
Parser: Parsetable Tokenstream -&gt; Tree

`Lexer` and `Parser` shall be universally applicable for any language whereas the `Postlexer` is always specific and can be omitted if the whole language can be described in a context free way.

Posted in Grammars | 4 Comments