Archive for June, 2009

Stitches of a flea language – defining Java annotations in Jython

Posted in Java, Jython on June 30th, 2009 by kay – 9 Comments

Jython annotations – anyone?

The last few days I tried to figure out how to create Jython annotations. A Jython annotation is defined here as a Java annotation lifted from Jython to Java. So one essentially defines a Java annotation in a Jython class. A Jython annotation shall not be confused with a decorator. A Python ( or Jython ) decorator is just a higher order function ( or callable ). It might create attributes in Jython objects but those are not the same as Java annotations reflected by the Java runtime. Without Jython annotations Jython is right now essentially stuck in a pre version 1.5 Javaverse and Jython classes are disconnected from modern Java frameworks and cannot be plugged.

Jython annotations in Jython 2.5 don’t work out of the box. It is not much known yet about how or when Jython annotations will be supported by core Jython. The lead Jython developer Frank Wierzbicki announced something along the lines in his 2008 PyCon conference talk but this is now about 16 months ago. I could temper my impatience if Jython annotations were just around the corner but what can we expect after those 16 months?

In this article I introduce a library that enables lifting of meta-data from Jython to Java and loading Java back into Jython. One key element is Java code generation and dynamic compilation using the Java 6 Compilation API. Another one is interface extraction of Jython classes using the rich reflection API provided by core Jython.

Lifting up Jython classes

For every Jython class JyClass one can generate a Java class JaFromJyClass by means of interface extraction. We assume JyClass to be a subclass of a Java class, e.g. Object, and translate the Jython class

class JyClass(Object):
    def foo(self, *args):
        print "foo", args

into a corresponding Java class

public class JaFromJyClass extends Object{
    PyObject jyobject;
 
    public PyObject foo(PyObject[] args)
    {
        return jyobject.invoke("foo", args);
    }
}

This class is basically a proxy for the jyobject member variable of type PyObject which is a Jython API type. Once we have generated the Java code from Jython we can dynamically compile and load the Java code into Jython:

JaFromJyClass = createJavaClass("JaFromJyClass", source)
jainstance = JaFromJyClass()
jainstance.jyobject = JyClass()
jainstance.foo(9)  # prints 'foo 9'

This was straightforward and hints on our translation strategy. Next we review the Jython to Java translations in more detail.

Jython to Java translations

PyObject Injections

We cannot be glad with the way the jyobject was assigned to the jainstance in the previous example. The particular assignment protocol implies that the Jython script has always control over the instantiation of Jython classes. But once we plug the class into a framework the framework takes over. A better solution is to inject the PyObject using a factory mechanism.

public JaFromJyClass() {
    super();
    jyobject = JyGateway.newInstance("JyClass", this, null);
    jaobject = (Object)jyobject.__tojava__(Object.class);
}

The factory is called JyGateway. The JyGateway is a Java class which defines HashMap called registry

public static HashMap<String, PyDictionary> registry = new HashMap<String, PyDictionary>();

The keys of the Java HashMap are Strings that represent class names. The PyDictionary is a dictionary of Jython functions. Right now two functions are defined: newInstance and callStatic. Both of them correspond to static methods of JyGateway. If JyGateway.newInstance(“JyClass”, this, null) is called the newInstance Jython function is fetched from the registry using “JyClass” as a key. The third argument of JyGateway.newInstance contains an array of Java objects passed as arguments to the newInstance function which returns a new PyObject. If the constructor doesn’t take an argument null is passed as in the example. The particular JyClass will never be exposed to Java code.

Overrdiding superclass methods

Aside from jyobject we have also defined jaobject in the JaFromJyClass constructor which has the type Object. Here Object is just the superclass of both JaFromJyClass and JyClass. The jaobjectis defined for the purpose of overriding superclass methods: we cannot simply change the signature of superclass methods in particular not the return value.

If public void foo(int x) is a method defined in the superclass of JyClass, the Java method generated from JyClass is

public void foo(int arg) { jaobject.foo(arg) }

The method foo called from jaobject is still the method implemented in Jython. The Jython method is just called with Java arguments and returns a Java value ( if any ) that gets converted back to Jython.

Calling static methods

Calling static or classmethods of Jython objects from Java is similar to calling JyGateway.newInstance:

public static PyObject bar(PyObject args[])
{
     return JyGateway.callStatic("JyClass", "bar", args);
}

Defining Jython metadata

There are three kinds of meta-data which can be added to Jython classes which are extracted for dynamic Java generation. Those are called jproperty, annotation and signature. They serve different purposes.

signature

A signature decorator is defined to assign Java argument and the return types to a Jython method. Without the signature decorator a default translation is applied:

def foo(self, arg):
    ...

—–>

public PyObject foo(PyObject[] args) {
    return jyobject.invoke("foo", args);
}

If we decorate foo with the following signature decorator

@signature("public int _(char)")
def foo(self, arg):
    ...

we get the translation

public int foo(char arg0){
    PyObject args[] = new PyObject[1];
    for(int i=0;i<1;i++) {
        args[0] = Py.java2py(arg0);
    }
    return (Integer)jyobject.invoke("foo", args).__tojava__(int.class);
}

The name of the function in the signature declaration string is of no relevance. That’s why we have used a single underscore.

annotation

The annotation decorator applies to methods and classes. We have to wait for Jython 2.6 for proper class decorator syntax but the semantics is the same when we write

cls = annotation(value)(cls)

The value passed to annotation is a string which must conform Java annotation syntax with the leading @ character being stripped.

@annotation("Override")
def foo(self, arg):
    ...

is a valid annotation which corresponds to the Java method

@Override
public PyObject foo(PyObject[] args) {
    return jyobject.invoke("foo", args);
}

Annotations can be stacked and also combined with the signature decorator. So we can define three new decorators

test  = annotation("Test")(signature("public void _()"))
setUp = annotation("Before")(signature("public void _()"))
beforeClass = annotation("BeforeClass")(signature("public static void _()"))

and use them within e.g. JUnit 4

from org.junit import*
from org.junit.Assert import*
 
class TestClass(Object):
    @beforeClass
    def start(self):
        print "Run 'TestClass' tests ..."
 
    @test
    def test_epoweripi(self):
        from math import e, pi
        assertTrue( abs( e**(pi*1j) + 1 ) < 10**-10 )

jproperty

The jproperty object is a descriptor which assigns a Java type and zero or more annotations to a Jython attribute.

class JyClass(Object):
    x = jproperty("private int", "X", "Y")

This is how it translates

pubic class JyClassBase(Object)
{
    @Y @X
    private int x;
}

A JyClass instance reads ( and sets ) jproperty values from the corresponding Java class instances it was assigned to. Remember that the jyobject instance construction looked like this

jyobject = JyGateway.newInstance("JyClass", this, null);

With this an instance of the Java class was passed to the newInstance factory function. Not only holds a javaobject a jyobject but also a jyobject holds a javaobject. Reading / writing jproperty values is the primary reason for this cyclic dependence.

Class import heuristics

Whenever a Java class gets compiled, names have to be imported from classes/packages using the import-statement. Jython applies some heuristics to extract class and package names from annotations/Jython code and creates Java import statements accordingly. An important precondition for a class/package to be found is that it has been imported in Jython code. This isn’t particularly cumbersome. When you define an annotation

annotation('SupportedAnnotationTypes("*")')

the name SupportedAnnotationTypes has to be made public using a normal Jython import:

from javax.annotation.processing import SupportedAnnotationTypes

This is not much different from considering an evaluation of the parameter string using Pythons eval.

The annotation class has a class attribute namespace which is used for Jython class and annotation extraction. If the heuristics fails to extract a class the namespace can be manually updated:

annotation.namespace.update(locals())

Jynx

Within the next few days I’ll launch a new project on code.google.com called jynx which will contain tools particularly suited for Jython utilization of modern Java frameworks and APIs. The relevant source code for this article can be found here and you can examine and play with it.

Into The Labyrinth – using the JavaCompiler API from Jython

Posted in Java, Python on June 23rd, 2009 by kay – 1 Comment

The Plumber

After having neglected Java for years I began to re-examine it this month together with Jython and my initial reaction was a culture shock.

Java is infamous for being a “plumbing language” i.e. you have to subclass some classes or implement a few interfaces and then plug them into a framework. Alternatively you have to call framework methods that expect a bunch of interrelated objects as parameters. None of those class implementations is particularly complicated but you have to figure out how all the objects are related to each other and essentially deal with configurations and dependencies on object level. It is easy to mess things up with every additional point of failure. There is also a tendency for abstraction inversion: you have to undertake many concrete steps within a complex machinery to create a simple building block. Abstraction inversion is an indication of a system being overdesigned.

The topic of this article is Javas compiler API and it is a nice show case to highlight some differences which are not really situated on language level but in the way problems are approached. Take Pythons compile function for example. Pythons compile function has the following signature

compile: (code_str, file_name, eval_mode) -> code_object

and is dead simple to use:

>>> compile("import EasyExtend.langlets", "<input>", "exec")
<code object <module> at 00EB0578, file "<input>", line 1>

If you need to read the source from a file you do just this

>>> compile(open("module.py").read(), "input", "exec")
<code object <module> at 00EB0578, file "<input>", line 1>

If you want to store the resulting bytecode in an appropriate file you need to do a little more work

import os, struct, marshal, imp
 
def write_code(filename, code):
    f = open(filename, "wb")
    try:
        mtime = os.path.getmtime(filename)
        mtime = struct.pack('<i', mtime)
        MAGIC = imp.get_magic()
        f.write(MAGIC + mtime)
        marshal.dump(code, f)
    finally:
        f.close()

This creates a Python bytecode file ( usually a ‘ pyc’ file ) and serializes the code object. Something like this could be implemented on the method level of the code object itself and a single call code.write(filename) would suffice to store the code.

How to compile Java dynamically?

Java is different. The Java Compiler API is specified in JSR 199. It contains 20 classes/interfaces and you have to figure out their interplay in order to compile a source string. Half of them are JavaFile or JavaFileManager classes/interfaces so what’s actually peripheral has moved to the center. The JavaCompiler API is new to Java 6 and it seems dynamic compilation didn’t work out of the box prior to release 6. I suppose one had to call javac from the command line or use a different compiler such as Janino.

There isn’t any easy way to get into the compiler API. The JSR 199 isn’t exactly a design document but a terse javadoc API documentation. Tutorials are rare and mostly superficial in that they cover the most simple use case only. A notable exception is an introduction written by Andrew Davison. It covers the most relevant use cases and more. Alternatively one might skim through the tests of the JDK 6. It contains expression evaluation code written by the JSR 199 implementor Peter von der Ahé. Once again tests are documentation. The Jython 2.5 code I’ll present follows Davisons article and is mostly a transcript of his Java code in the relevant parts.

The JavaCompiler API from Jython

Prerequisites: a JDK for Java 6 has to be installed first. The Java compiler tools are implemented in

<JDK-Path>/lib/tools.jar

and that path has to be added to the Java class path when Jython is invoked. Alternatively you can add the path to thePYTHONPATH inside of your application. I’ll chose the latter approach here:

import sys
sys.path.append(os.path.join(os.environ["JAVA_HOME"],"lib", "tools.jar"))

The environment variable JAVA_HOME has to be set to your JDK path – not to the JRE path. This might have to be changed. In case of my Windows notebook the path is

JAVA_HOME = C:\Programme\Java\jdk1.6.0_13.

Another configuration aspect concerns access to protected members of Java classes. By default this is disabled in Jython. We will need to access protected member functions once we override a Java class loader. For that reason one has to set following disrespectful flag in Jythons registry file

python.security.respectJavaAccessibility = false

For more information see the Jython FAQ.

Now we are ready to import the compiler tools:

from javax.tools import*
from com.sun.tools.javac.api import JavacTool
 
compiler = JavacTool.create()
assert compiler

The CompilationTask

After having created a compiler instance we now care about the CompilationTask which is defined in JSR 199. It is basically a callable, an interface which specifies a parameter-less call() method that returns a value of type Boolean. It is this callmethod that has to be overridden in subclasses and called for compilation. In case of the JavaCompiler framework one doesn’t have to override the CompilationTaskexplicitly but fetches it from the compiler object using the the getTask method:

JavaCompiler.CompilationTask getTask(Writer out,
                             JavaFileManager fileManager,
                             DiagnosticListener&lt;? super JavaFileObject&gt; diagnosticListener,
                             Iterable&lt;String&gt; options,
                             Iterable&lt;String&gt; classes,
                             Iterable&lt;? extends JavaFileObject&gt; compilationUnits)

The parameters need further examination but once we have understood how to create the objects required to fetch the CompilationTask the compilation is performed by

compiler.getTask(...).call()

The meaning of the parameters of the CompilationTask is as follows:

Parameters:

  • out – a Writer for additional output from the compiler; use System.err if null
  • fileManager – a file manager; if null use the compiler’s standard filemanager
  • diagnosticListener – a diagnostic listener; if null use the compiler’s default method for reporting diagnostics
  • options – compiler options, null means no options
  • classes – class names (for annotation processing), null means no class names
  • compilationUnits – the compilation units to compile, null means no compilation units

For now we will ignore the out parameter as well as compiler options and class names passed to the annotation processors.

The compilationUnits holds the source code to be compiled. In case of Jython it will be a Python list containing a single JavaFileObject.

So a call of getTask will have the following shape

compiler.getTask(None,
                 fileManager,
                 diagnosticListener,
                 None,
                 None,
                 [source]).call()

Diagnostics

DiagnosticCollectors are used for error reporting. A DiagnosticCollector implements a DiagnosticListener interface. It is parametrized by some type S and holds a possibly empty list of Diagnostic&lt;S&gt; objects which can be fetched.

We only need to know as much about diagnostics to handle failure cases:

 if not task.call():
     msg = "\n  "+"\n  ".join(str(d) for d in diagnostics.getDiagnostics())
     raise JavaCompilationError(msg)

The JavaCompilationError is a custom Jython exception. The JavaCompiler API doesn’t raise an exception on compilation failure.

JavaFileObject

The JavaFileObject specifies a file abstraction. For our own purposes we need two of them: one that is readable and holds the source code ( a string ) and one that is writable and holds the bytecode ( an array of bytes ). Both derive from SimpleJavaFileObject which provides an implementation of the JavaFileObject interface.

from java.net import URI
from java.io import*
 
class StringJFO(SimpleJavaFileObject):
    '''
    JavaFileObject implemention used to hold the source code.
    '''
    def __init__(self, className, codestr):
        self.codestr = codestr
        super(StringJFO, self).__init__(URI(className),
                                        JavaFileObject.Kind.SOURCE)
 
    def getCharContent(self, errs):
        return self.codestr
 
class ByteArrayJFO(SimpleJavaFileObject):
    '''
    JavaFileObject implementation used to hold the byte code.
    '''
    def __init__(self, className, kind):
        super(ByteArrayJFO, self).__init__(URI(className), kind)
        self.baos = ByteArrayOutputStream()
 
    def openInputStream(self):
        return ByteArrayInputStream(self.getByteArray())
 
    def openOutputStream(self):
        return self.baos
 
    def getByteArray(self):
        return self.baos.toByteArray()

In case of the ByteArrayJFO a writable ByteArrayOutputStream is created that can be fetched by the framework using the openInputStream method.

An instance of StringJFO will become our compilationUnit and an instance of ByteArrayJFO will be returned by a still to be defined FileManager:

class ByteJavaFileManager(ForwardingJavaFileManager):
    def __init__(self, fileManager):
        super(ByteJavaFileManager, self).__init__(fileManager)
        self.code = None
 
    def getJavaFileForOutput(self, location, className, kind, sibling):
        self.code = ByteArrayJFO(className, kind)
        return self.code

A Java compiler in Jython

Now we can put all those things together and define a compileClass function.

def compileClass(className, codeStr, *flags):
    compiler = JavacTool.create()
    assert compiler, "Compiler not found"
    diagnostics = DiagnosticCollector()
    jfm = ByteJavaFileManager(compiler.getStandardFileManager(diagnostics,
                                                              None,
                                                              None))
    task = compiler.getTask(None,
                            jfm,
                            diagnostics,
                            flags,
                            None,
                            [StringJFO(className+".java", codeStr)])
    if not task.call():
        e = "\n  "+"\n  ".join(str(d) for d in diagnostics.getDiagnostics())
        raise JavaCompilationError(e)
    return jfm.code

In order to initialize the StringJFO object we have to pass a file name which corresponds to the the class name we use. The return values of compileClass is an object of type ByteArrayJFO that holds the byte code.

Adding a ClassLoader

We are almost done. To complete our exercise we have to make the byte code executable which means we have to define a class loader and create an actual Java class. Since we operate from within Jython it will be immediately wrapped into a Python type.

from java.lang import ClassLoader
from java.lang import ClassNotFoundException
 
class ByteClassLoader(ClassLoader):
    def __init__(self, code):
        super(ByteClassLoader, self).__init__(ClassLoader.getClassLoader())
        self.code = code
 
    def findClass(self, className):
        code = self.code.getByteArray()
        cl = self.defineClass(className, code, 0, len(code))
        if cl is None:
            raise ClassNotFoundException(className)
        else:
            return cl
 
def createJavaClass(className, codeStr, *compilerflags):
    loader = ByteClassLoader(compileClass(className, codeStr, *compilerflags))
    return loader.loadClass(className)

To a very good end we have hidden the JavaCompiler API behind a single function with a tiny interface. There is not something Python has been needed for actually. The JavaCompiler API is a Swiss Army knife style API that requires a whole object tree to be created to achieve a simple effect.

Java classes from within Jython

Finally we want to demonstrate the value of our implementation showing a few simple examples. The jcompile.py module contains the complete code and can be downloaded here.

The first example is our “Hello Jython”:

codeStr = """
public class Foo {
    public static void main(String args[])
    {
        System.out.println("Hello, "+args[0]);
    }
}
"""
 
Foo = createJavaClass("Foo", codeStr)
print Foo  # &lt;type 'Foo'&gt;
 
Foo.main(["Jython!"])  #  Hello, Jython!

In our next example we import some symbols from a Java library:

codeStr = '''
import java.lang.Math;
public class Prime {
    public static Boolean isPrime(double n)
    {
        for(int d=2;d&lt;=Math.sqrt(n);d++)
        {
            if(n%d == 0)
                return false;
        }
        return true;
    }
}
'''
 
Prime = createJavaClass("Prime", codeStr)
Prime.isPrime(2)     # True
Prime.isPrime(9971)  # False
Prime.isPrime(9973)  # True

————— Update ! We. 2009-24-06 —————–

Right now it is not possible to derive a class from a dynamically compiled Java class. If Foo is a dynamically generated Java class and Baris defined as

class Bar(Foo):
    pass

then a ClassNotFoundException is raised. The only possible workaround I see right now is to store the class to the disk and import it right after this. A possible adaption of the createJavaClass function looks like:

def createJavaClass(className,
                    codeStr,
                    todisk = True,
                    remove = True,
                    *compilerflags):
    '''
    Compiles and loads a new Java class.
    '''
    compiled = compileClass(className, codeStr, *compilerflags)
    if todisk:
        code = compiled.getByteArray()
        clsfile = open(className+".class", "wb")
        try:
            code.tofile(clsfile)
        finally:
            clsfile.close()
        jclass = __import__(className)
        if remove:
            os.remove(className+".class")
        return jclass
    else:
        loader = ByteClassLoader(compileClass(className, codeStr, *compilerflags))
        return loader.loadClass(className)

Linear bounds for backtracking parsers with memoization

Posted in Parsing on June 12th, 2009 by kay – 2 Comments

On the comp.compilers mailing list I asked for a confirmation of the O(n) complexity claim for packrat parsers. A packrat parser is a particular top down recursive descendant parser which applies backtracking on failure of consuming a token by application of a parsing expression. Now a backtracking parser usually exhibits O(2n) complexity and the essential claim is that memoization brings it down to O(n) on the cost of expanding storage. The work of Bryan Ford about packrat parsers refers to an original article written by Birman & Ullmann that gave a proof. The cited article costs 19$ at IEEE which I felt I would spent when I failed to give a proof by myself. Fortunately the proof is elementary and not complicated. So I could save my money.

Here we go.

Each non-terminal N of a grammar G has a finite first set {T1, .., Tn} which are terminal symbols that can be recursively determined by examining the start of the rule N. So if N has the shape

N → R1 … | R2 …

then

FirstSet(N) = FirstSet(R1) ⋃ FirstSet(R2) …

with FirstSet(Ri ) = { Ri } if Ri is a terminal symbol.

Next we consider the preimages of terminals with regard to FirstSets. We call them ancestors and for a terminal T the set of ancestors is defined by:

Ancestors(T) = {T} ∪ {N ∈ NonTerminals(G)| T ∈ FirstSet(N)}

Each terminal T has at least one ancestor ( itself ) and at most K = #NT(G)+1 where #NT(G) is the number of non-terminals of the grammar G.

Suppose now we feed a token stream T1 T2 … Tn into a top down parser. For each Ti in the stream we can apply all of the rules in Ancestors(Ti) and store the results in a set

Trees(i) = {(i, Tree(R))| R ∈ Ancestors(Ti) }

Since for all i the inequality#Ancestors(Ti) ≤ K holds the set

Trees = i∈{1, …, n}Trees(i)

has at most K*n elements. If a new tree needs to be computed the sub-trees may be looked up in Trees. If one is missing that one is computed as well etc. In any case no more than K*n computations need to be performed to determine all elements of Trees. Since K was a constant that depended only on the grammar we can conclude that the asymptotic complexity of computing Trees is O(n). Since Trees contains a solution to our problem it can be computed in O(n) as well.

Is parsing Perl really impossible?

Posted in Grammars, Parsing on June 12th, 2009 by kay – 5 Comments

I just read this article about the apparent inability to parse Perl 5.

There is an underlying assumption that a parser has to resolve all ambiguities and derive a single parse tree from an expression ( giving a unique interpretation ). But instead of demanding the value of an expression in order to parse it the parser might derive multiple parse trees and insert them as child nodes into a new parse tree representing a conditional expression. So the parser can just inflate the parse tree structure and do its job.

Whether or not the conditional expression can be computed at runtime doesn’t have to bother the compiler developer. It is just another function call and no more dubious then an arbitrary function call in a conditional expression

if(foo())
    printf("foo() is true\n");
else
    printf("foo() is false\n");

So it makes sense to fix requirements to a parser before giving proofs of their impossibility.

Jython – 64K ought to be enough for anybody

Posted in General on June 10th, 2009 by kay – 9 Comments
Jython 2.5rc4 (Release_2_5rc4:6470, Jun 8 2009, 13:23:16)
[Java HotSpot(TM) Client VM (Sun Microsystems Inc.)] on java1.6.0_13
Type "help", "copyright", "credits" or "license" for more information.
&gt;&gt;&gt; L = range(10000)
&gt;&gt;&gt; eval(str(L))
Traceback (most recent call last):
  File "", line 1, in
java.lang.ClassFormatError: Invalid method Code length 99894 in class file org/python/pycode/_pyx9
        at java.lang.ClassLoader.defineClass1(Native Method)
        at java.lang.ClassLoader.defineClass(Unknown Source)
        at org.python.core.BytecodeLoader$Loader.loadClassFromBytes(BytecodeLoader.java:116)
        at org.python.core.BytecodeLoader.makeClass(BytecodeLoader.java:35)
        at org.python.core.BytecodeLoader.makeCode(BytecodeLoader.java:65)
        ...
        at org.python.util.InteractiveConsole.interact(InteractiveConsole.java:90)
        at org.python.util.jython.run(jython.java:293)
        at org.python.util.jython.main(jython.java:113)
 
java.lang.ClassFormatError: java.lang.ClassFormatError: Invalid method Code length 99894 in class file org/python/pycode/_pyx9
&gt;&gt;&gt;

64K ought to be enough for anybody.

irb(main):001:0&gt; L = (1...10000).to_a ;nil
=&gt; nil
irb(main):002:0&gt; SL = eval "["+L.join(",")+"]"; nil
=&gt; nil
irb(main):003:0&gt; SL[10..15]
=&gt; [11, 12, 13, 14, 15, 16]
irb(main):004:0&gt;

Just to get this straight. It is argued for almost decade that Python cannot have some advanced features ( the whole FP/Stackless story ) because this would break Jython and Jython is implemented s.t. eval is broken and Jython users cannot even import Python data-structures like generated dicts or lists that exceed 64KB. What about backporting this behavior to CPython in order to preserve conformance between implementations?

Note that the Ruby example is JRuby – not that anyone blames the JVM. Ruby is morally inferior though and wildly coerces types on its own behalf but this is another story.

CodeTemplates – a fresh look on code transformations

Posted in DSL, Grammars, TBP on June 7th, 2009 by kay – Be the first to comment

This is the first part of a two part article series about CodeTemplates. CodeTemplates are a code transformation technique that suggests an alternative to common syntactic as well as lexical macros for whole language transformations.

Syntactic macros

About three years ago I experimented with a “syntactic” macro system for Python incorporated in EasyExtend 2. In the meantime another project emerged called MetaPython striving for a similar goal under the sign of the Ouroboros.

Fundamentally a macro system defines functions called macros that act as compile time code generators. It is not surprising that code is data – how else shall a compiler work? – however few languages specify functions that treat source code as source code and operators that evaluate source code at compile time. By quasi-quotation some piece of code in a macro body is mentioned rather than evaluated. Complementary to quasi-quoting there is a splice-operator that evaluates code and returns other code preferably in AST form that becomes aligned with the AST of the quasi-quoted code.

The difference between source code and ASTs is going to vanish in so called homoiconic languages like Lisp where the source code that is written by the user is also a data-structure of the language. So a macro is essentially a function that transforms those data-structures meant to be code. It doesn’t mean though that homoiconicity is prerequisite for syntactic macros. The cited MetaPython but also MetaLua are examples for macro systems in non-homoiconic languages. A nice treatment of the foundations of macro systems in non-homoiconic languages can also be found in the documentation of the Converge language.

Syntactic macros in EE 2

EasyExtend 2.0 defined a macro system in an own langlet, simply called the macro langlet. The role of macros in EasyExtend is somewhat peculiar though because EE is a system used for whole language transformations with language syntax specified in grammars. EE binds node handlers to parse tree node types and those node handlers emit parse tree nodes of the target language. They were just meant to facilitate code transformations within the framework.

Unfortunately it turned out that they didn’t. A bug in a macro caused hard to detect language transformation errors. Error recovery is a major concern if not the single major concern with language transformers and the complicated quasi-quoting / splicing semantics of macros made this even harder. More often than I ever wanted I debugged through framework when attempting to detect a bug in the macro. Furthermore the macro langlet operated within the system and was transformed as well. So two languages were transformed at the same time whereas one of them was additionally engaged in transforming the other. EasyExtend 2 made this possible without any conflicts implementing a joint transformation machinery and separated ranges of node types node handlers could bind to. The macro transformation was implemented as a set of node handlers in the LangletTransformer of the macro langlet and it was an ugly beast of code. I intended to rewrite this for EE 3 but it at some point of time I decided to terminate this intellectual exercise and dropped the macro facility completely.

Why not just textual macros?

A textual substitution macro occasionally also called “lexical” macro is a simple way to do code transformations. The most important programming language ever, C, implements them by means of the C preprocessor. Lexical macros are also known as poor-mans macros and using them in EasyExtend was like leaving civilization because city-life is complicated and corrupt and one better strives for a subcomplex life as a farmer or hunter-gatherer. Furthermore EE is in deep love with syntax so a radical conceptualization of the macro idea is required. Fortunately the intellectual effort for this is low and can be summarized in one sentence

Enable arbitrary string transformations guarded by syntax

This is universally applicable for all context free languages and doesn’t require any additional macro syntax.

If one attempts to transform one string S1 into another one S2 all that is needed are the following three operators

  • insert
  • delete
  • subst

The edit distance between S1 and S2 is the minimal number of operations of this kind used to transform S1 into S2. It can be computed by the popular algorithm of Levenshtein.

EasyExtend 4 will use a slightly different set of operators

  • replicate
  • delete
  • subst

It is almost equally expressive except for the case of an empty string S1 that requires an initial insert and there is nothing yet to replicate, delete or substitute. But once you have a single character in a string, one can replicate this character and apply substitutions which yields the same as applying multiple insertions.

CodeTemplate objects in EE 4

EasyExtend 4 defines a module called csttemplate.py that defines 4 types of objects called CodeTemplate, CodeMarker, CodeSelection and SelectionTree.

A CodeTemplate is an object that wraps some piece of source code in some language. The source code is parsed on construction of the CodeTemplate.

A CodeMarker marks sections in the source by means of search pattern. One can bind arbitrary many CodeMarker objects to a CodeTemplate. It is possible to also bind CodeMarkers to other CodeMarkers building a CodeMarker tree. When a CodeMarker B is bound to another CodeMarker A the search pattern of B is only applied to the code marked by A.

A CodeTemplate implements a matchmethod. If it is applied the pattern specified by all bound CodeMarker objects are matched against the code of the CodeTemplate. The result is a SelectionTree object. The tree holds a list of CodeSelection objects as well as a list of subtrees.

All transformations of the source code wrapped by the CodeTemplate are applied through CodeSelection objects. It is the CodeSelection object that substitutes, replicates or deletes code that is bound to. Note that internally all those operations still happen on a parse tree. Invalid operations will immediately be rejected and an exceptions is raised. It is not a complete syntax check on the whole code that is applied on each operation and it wouldn’t make sense to perform one either because some piece of code can morphed into code of another language and it will be a hybrid piece of code except at transformations start and the endpoint.

The SelectionTree is clonable and can be reused for other substitutions.

Outlook

In the next article I’ll show CodeTemplates in action.

Singular irregularities

Posted in General on June 6th, 2009 by kay – 3 Comments

An irregularity is a shape or rule violation. It supposes that objects are built according to rules but there are exceptional cases that don’t really fit or do at least violate our expectations of the building law. An irregularity is singular if the building law is violated in a single case.

The best known example in Python is the syntax of tuples:

()                          # empty tuple
(arg)                       # parenthesized expression - no tuple!
(arg,)                      # tuple with one element
(arg1, arg2, ...)           # tuple with many arguments

Another more recent example is the set syntax in Python 3 which requires an ambiguity resolution involving an empty set and an empty dict:

{}                          # empty dict!
set()                       # empty set !
{arg}                       # set with one element
{arg1, arg2, ...}           # set with many elements

Some people suggested to drop parentheses around arguments in function calls. This leads to an ambiguity between a function object and function call without arguments. In order to resolve it one might introduce another singular irregularity:

func                      # object !
func arg1 arg2...         # function call with arguments
func()                    # function call
func(arg1, arg2, ...)     # function call with arguments

Are singular irregularities design mistakes or are they acceptable trade offs for avoiding verbosity and/or the introduction of even more syntax?