DSL

CodeTemplates – a fresh look on code transformations

Posted in DSL, Grammars, TBP on June 7th, 2009 by kay – Be the first to comment

This is the first part of a two part article series about CodeTemplates. CodeTemplates are a code transformation technique that suggests an alternative to common syntactic as well as lexical macros for whole language transformations.

Syntactic macros

About three years ago I experimented with a “syntactic” macro system for Python incorporated in EasyExtend 2. In the meantime another project emerged called MetaPython striving for a similar goal under the sign of the Ouroboros.

Fundamentally a macro system defines functions called macros that act as compile time code generators. It is not surprising that code is data – how else shall a compiler work? – however few languages specify functions that treat source code as source code and operators that evaluate source code at compile time. By quasi-quotation some piece of code in a macro body is mentioned rather than evaluated. Complementary to quasi-quoting there is a splice-operator that evaluates code and returns other code preferably in AST form that becomes aligned with the AST of the quasi-quoted code.

The difference between source code and ASTs is going to vanish in so called homoiconic languages like Lisp where the source code that is written by the user is also a data-structure of the language. So a macro is essentially a function that transforms those data-structures meant to be code. It doesn’t mean though that homoiconicity is prerequisite for syntactic macros. The cited MetaPython but also MetaLua are examples for macro systems in non-homoiconic languages. A nice treatment of the foundations of macro systems in non-homoiconic languages can also be found in the documentation of the Converge language.

Syntactic macros in EE 2

EasyExtend 2.0 defined a macro system in an own langlet, simply called the macro langlet. The role of macros in EasyExtend is somewhat peculiar though because EE is a system used for whole language transformations with language syntax specified in grammars. EE binds node handlers to parse tree node types and those node handlers emit parse tree nodes of the target language. They were just meant to facilitate code transformations within the framework.

Unfortunately it turned out that they didn’t. A bug in a macro caused hard to detect language transformation errors. Error recovery is a major concern if not the single major concern with language transformers and the complicated quasi-quoting / splicing semantics of macros made this even harder. More often than I ever wanted I debugged through framework when attempting to detect a bug in the macro. Furthermore the macro langlet operated within the system and was transformed as well. So two languages were transformed at the same time whereas one of them was additionally engaged in transforming the other. EasyExtend 2 made this possible without any conflicts implementing a joint transformation machinery and separated ranges of node types node handlers could bind to. The macro transformation was implemented as a set of node handlers in the LangletTransformer of the macro langlet and it was an ugly beast of code. I intended to rewrite this for EE 3 but it at some point of time I decided to terminate this intellectual exercise and dropped the macro facility completely.

Why not just textual macros?

A textual substitution macro occasionally also called “lexical” macro is a simple way to do code transformations. The most important programming language ever, C, implements them by means of the C preprocessor. Lexical macros are also known as poor-mans macros and using them in EasyExtend was like leaving civilization because city-life is complicated and corrupt and one better strives for a subcomplex life as a farmer or hunter-gatherer. Furthermore EE is in deep love with syntax so a radical conceptualization of the macro idea is required. Fortunately the intellectual effort for this is low and can be summarized in one sentence

Enable arbitrary string transformations guarded by syntax

This is universally applicable for all context free languages and doesn’t require any additional macro syntax.

If one attempts to transform one string S1 into another one S2 all that is needed are the following three operators

  • insert
  • delete
  • subst

The edit distance between S1 and S2 is the minimal number of operations of this kind used to transform S1 into S2. It can be computed by the popular algorithm of Levenshtein.

EasyExtend 4 will use a slightly different set of operators

  • replicate
  • delete
  • subst

It is almost equally expressive except for the case of an empty string S1 that requires an initial insert and there is nothing yet to replicate, delete or substitute. But once you have a single character in a string, one can replicate this character and apply substitutions which yields the same as applying multiple insertions.

CodeTemplate objects in EE 4

EasyExtend 4 defines a module called csttemplate.py that defines 4 types of objects called CodeTemplate, CodeMarker, CodeSelection and SelectionTree.

A CodeTemplate is an object that wraps some piece of source code in some language. The source code is parsed on construction of the CodeTemplate.

A CodeMarker marks sections in the source by means of search pattern. One can bind arbitrary many CodeMarker objects to a CodeTemplate. It is possible to also bind CodeMarkers to other CodeMarkers building a CodeMarker tree. When a CodeMarker B is bound to another CodeMarker A the search pattern of B is only applied to the code marked by A.

A CodeTemplate implements a matchmethod. If it is applied the pattern specified by all bound CodeMarker objects are matched against the code of the CodeTemplate. The result is a SelectionTree object. The tree holds a list of CodeSelection objects as well as a list of subtrees.

All transformations of the source code wrapped by the CodeTemplate are applied through CodeSelection objects. It is the CodeSelection object that substitutes, replicates or deletes code that is bound to. Note that internally all those operations still happen on a parse tree. Invalid operations will immediately be rejected and an exceptions is raised. It is not a complete syntax check on the whole code that is applied on each operation and it wouldn’t make sense to perform one either because some piece of code can morphed into code of another language and it will be a hybrid piece of code except at transformations start and the endpoint.

The SelectionTree is clonable and can be reused for other substitutions.

Outlook

In the next article I’ll show CodeTemplates in action.

Python vs TTCN-3

Posted in DSL, Python, Testing on April 26th, 2009 by kay – 5 Comments

Some time ago computing scientists Bernard Stepien and Liam Peyton from the University of Ottawa compared Python with TTCN-3. TTCN-3 means Testing and Test Control Notation and it is domain specific language specifically designed for writing tests in the domain of telecommunication. In almost any category poor Python just loses in comparison.

A few things of the presentation are just odd. At several places it contains Python code that is not even consistent. Examples: on slide 42 an “else if” construct is used instead of the correct “elif”. On slide 13 fields are not assigned to self which leads to code that fails to run. But leaving this carelessnes aside there is something important that estranges an experienced Python programmer. Take slide 12 for example. Here the authors state:

TTCN-3 templates to Python
  • TTCN-3 templates could be mapped to Python object instances.
  • However, there are serious limitations using the above technique with Python.
  • Python objects are practically typeless.

Then the presentation goes over 18 slides just to explain how bad plain Python classes are for serving the same purposes as TTCN-3 templates. Although this is correct why should anyone care? If you want to experience an exhilarating effect you have to turn grapes into wine and do not expect the same fun from grape juice. Then you can compare cheap wine with premium one. That’s also why people are used to compare Django to Ruby On Rails and not Django to Ruby or Rails to Python. That’s why libraries and frameworks exist in the first place.

So what about modeling a Template class that has same functionality as a TTCN-3 template? Here is a first attempt:

class Template(object):
    def __init__(self):
        self._frozen = False
        self._fields = []        
 
    def __setattr__(self, name, value):
        if name in ('_fields', '_frozen'):
            object.__setattr__(self, name, value)
            return
        if self._frozen:
            raise RuntimeError("Cannot set attribute value on a frozen template")
        if name not in self._fields:
            self._fields.append(name)
        object.__setattr__(self, name, value)
 
    def __eq__(self, other):
        if len(self._fields)!=len(other._fields):
            return False
        else:
            for f_self, f_other in zip(self._fields, other._fields):
                val_self  = getattr(self, f_self)
                val_other = getattr(self, f_other)
                if val_self != val_other:
                    return False
            return True
 
    def __ne__(self, other):
        return not self.__eq__(other)
 
    def freeze(self):
        self._frozen = True
 
    def clone(self):
        T = Template()
        T._fields = self._fields[:]
        for field in T._fields:
            setattr(T, field, getattr(self, field))
        return T

The Template class manages an internal _fields attribute that contains the attribute names created by __setattr__ dynamically. This is done to preserve the sequential order of the fields on template comparison. In the current implementation there aren’t any fields marked as optional and it will take additional effort to introduce them and modify __eq__.

It is now easy to demonstrate the behavior of wildcards. First we have to introduce another class for this purpose:

class Wildcard(object):
    def __eq__(self, other):
        return True
 
    def __ne__(self, other):
        return False

For any object X which is compared to wildcard it is always X == wildcard == True. So following template instances are equal:

templ_1 = Template()
templ_1.field_1 = wildcard
templ_1.field_2 = "abc"
 
templ_2 = Template()
templ_2.field_1 = "xyz"
templ_2.field_2 = wildcard
 
assert templ_1 == templ_2

A Pattern class can be created in the same way as the Wildcard class:

class Pattern(object):
    def __init__(self, regexp):
        self.pattern = re.compile(regexp)
 
    def __eq__(self, other):
        try:
            return bool(self.pattern.match(other))
        except TypeError:
            return True
 
    def __ne__(self, other):
        return not self.__eq__(other)

An ‘==’ comparison of a Pattern instance with a string will match the string against the pattern and yield True if the string could be matched and False otherwise.

So it is perfectly possible to reproduce most aspects of TTCN-3 templates by just a few lines of Python code. On the downside of TTCN-3 it isn’t possible to do things the other way round. No matter how hard you work in TTCN-3 it is impossible to create new TTCN-3 types showing interesting behavior. Pythons expressiveness is miles ahead.

The thunk_stmt and programmable semantics In Python

Posted in DSL, Grammars, Python on March 9th, 2009 by kay – 4 Comments

Seems as if there was just a single direction of syntactical improvement of Python these days which is the introduction of Ruby style blocks. Tav suggests to reuse Pythons with-statement:

with_stmt: "with" expression (["as" target] | ["do" [parameter_list]]) ":" suite

Guido responds promptly and rejects the semantic dispersion of the with-statement. No meaning changer by using just a magical do keyword. In Tavs vision the do-branch of the with-statement is turned into an anonymous closure that is called by a special __do__ function. The __do__ shall also be customizable which finally leads to DSLs:

But more enterprising __do__ functions could do interesting things like rewriting the AST of the BLOCK_CODE. Given that this is already done by many many Python libraries, it’s not much of a stretch of the imagination. The benefit? Super nice Pythonic DSLs!

Well it is quite easy to put this idea upside down. Instead of re-purposing a special purpose statement like with one can simply turn each expression and “small statement” e.g. raise, assert, return, assignment etc. into a compound statement using following definition

thunk_stmt: small_stmt ':' suite

One has to elaborate on it a little and change rules of Pythons existing grammar for LL(1) conformance

stmt: compound_stmt
single_input: NEWLINE | compound_stmt NEWLINE
compound_stmt: ( if_stmt | while_stmt | for_stmt | try_stmt | with_stmt | 
                 funcdef | classdef | thunk_stmt )
thunk_stmt: small_stmt ( ':' suite | (';' small_stmt)* [';'] NEWLINE )

Here the thunk_stmt is a fusion of the definition stated above and the simple_stmt of Pythons grammar. This definition doesn’t lead to programmable syntax but programmable semantics which is what Tav intends to support with a customizable __do__.

An obvious use case would be property definitions:

x = property():
    def get(self):
        return self._x
 
    def set(self, value):
        self._x = value

It simply appends a block of code to a valid assignment x = property().

Same goes with

employees.select.do(emp)

which might be extended to

employees.select.do(emp):
    if emp.salary > developer.salary:
            return fireEmployee(emp)
    else:
        return extendContract(emp)

Without an appended block the expression

employees.select.do(emp)

might simply mean

employees.select(lambda emp: emp)

It should be clear that Python needs a protocol to support this which means here that a __special__ function is attached to an object and that for user defined classes this __special__ function can be overwritten.

Lonely Python is Intelligent Design and how to liberate from it

Posted in DSL, Grammars, Python on March 7th, 2009 by kay – 2 Comments

Python is doomed. Well, of course it is not yet doomed. Python is an evolutionary dead end living lonely on its own fitness plateau. It is clear that Python won’t have offspring because of significant whitespace aka indentation sensitivity. No children, no future.

You might ask where are the news? There are none and we knew this for ever and a decade but sometimes it is in the news again. Read this post by the author of a programming language called Reia. He struggles endlessly with indentation delimited blocks and finally abandons them for Ruby style end terminators ( not before a popularity poll in the Web 2.0 age and consulting the author of the failed Logix project who is now a Ruby advocate). Finally his language looks like Ruby which demonstrates the evolutionary fitness of Ruby. Python on the other hand is a piece of Intelligent Design. Guido van Rossum has managed a ” beautiful set of compromises” when he designed the Python grammar. So it ends up in overspecialization whereas languages like JavaScript are flowing more with the nature. To illustrate this nature you might examine the file jQuery.js that implements the popular jQuery library. The impatients might immediately scroll to the last line. It’s sweet.

Some people have argued that Haskell style layouts may save the soul of significant whitespace. Transplanted to Python this would imply mandatory explicit block delimiters that can also be omitted for the now classical indentation style. If you are not willing to decide just give multiple choice style freedom to anyone.

In the rest of the article I’m discussing optional explicit delimiters for Python built into the grammar and making the statement/expression distinction weightless. I promise not to cringe when it becomes “ugly”. Not behaving like little girls seems to be an evolutionary dead end among male, adult programmers as well but I take it with stoicism.

Grammar Extension

atom: ( '(' [embedded_stmt|yield_expr] ')' |'[' [listmaker] ']' |
        '{' [dictmaker] '}' |
         '`' testlist1 '`' |
         NAME |  NUMBER | STRING+)
trailer: '(' [arglist|embedded_suite] ')' | '[' subscriptlist ']' | '.' NAME
 
embedded_stmt: ( embedded_funcdef |
    embedded_if_stmt |
    embedded_while_stmt |
    embedded_for_stmt |
    embedded_try_stmt |
    embedded_with_stmt |
    embedded_classdef |
    embedded_small )
 
embedded_small: small_stmt (';' small_stmt)*
embedded_suite: embedded_stmt+
embedded_funcdef: [decorators] 'def' NAME parameters ':' embedded_suite
embedded_if_stmt: ( 'if' test ':' embedded_suite ('elif' test ':' embedded_suite)*
                   ['else' ':' embedded_suite])
embedded_while_stmt: 'while' test ':' embedded_suite ['else' ':' embedded_suite]
embedded_for_stmt: ( 'for' exprlist 'in' testlist ':' embedded_suite
                     ['else' ':' embedded_suite] )
embedded_try_stmt: ('try' ':' suite ((except_clause ':' embedded_suite)+
                    ['else' ':' embedded_suite] ['finally' ':' embedded_suite] |
                   'finally' ':' embedded_suite))
embedded_with_stmt: 'with' test [ with_var ] ':' embedded_suite
embedded_classdef: 'class' NAME ['(' [testlist] ')'] ':' embedded_suite

This is a conservative grammar extension of the official Python 2.5 Grammar. Conservative means that all expressions / statements written in Python 2.5 remain valid. Those non-terminals that substitute existing ones are atom and trailer. In detail the testlist_gexp non-terminal in atom is replaced by the newly defined embedded_stmt non-terminal and instead of an arglist an embedded_suitecan be used . The embedded_stmt non-terminal is the gateway for all kinds of statements embedded into expressions. The statement-expression distinction becomes de facto obsolete. What happens to significant whitespace? Post-processing the token stream filters whitespace inside of parentheses. Hence the token stream won’t contain NEWLINE, INDENTand DEDENT token within expressions. All those embedded_xxx non-terminals are only adjustments of existing ones reflecting the absence of the mentioned whitespace token in expressions. Notice that you can’t mix. Once you are living in a grammatical expression whitespace becomes definitely insignificant.

The grammar is not LL(1) and you need a stronger parser than Pythons own one to handle it. There is also no behavior associated with the new definitions. Semantics shall be entirely neglected and only the future shape of things shall be highlighted. Here are a few examples:

Multiline Lambda

lambda x,y: (
    z = x+y;              # semicolons always separate small statements
    print z;
    return z
)(3, 2)

Long Assertions

assert ( if x>0:
    x == 2
    else:
        x == -2)

The Return of the Return

def foo(x):
    return ( return )

My Property

x = property(
    def fget(self):
        return self._x
    def fset(self, value):
        self._x = value
)

Function in the middle

x + (def fact(n):
     if n<=1:
          return 1
     else:
          return fact(n-1))(7) + z