CodeTemplates – a fresh look on code transformations

This is the first part of a two part article series about CodeTemplates. CodeTemplates are a code transformation technique that suggests an alternative to common syntactic as well as lexical macros for whole language transformations.

Syntactic macros

About three years ago I experimented with a “syntactic” macro system for Python incorporated in EasyExtend 2. In the meantime another project emerged called MetaPython striving for a similar goal under the sign of the Ouroboros.

Fundamentally a macro system defines functions called macros that act as compile time code generators. It is not surprising that code is data – how else shall a compiler work? – however few languages specify functions that treat source code as source code and operators that evaluate source code at compile time. By quasi-quotation some piece of code in a macro body is mentioned rather than evaluated. Complementary to quasi-quoting there is a splice-operator that evaluates code and returns other code preferably in AST form that becomes aligned with the AST of the quasi-quoted code.

The difference between source code and ASTs is going to vanish in so called homoiconic languages like Lisp where the source code that is written by the user is also a data-structure of the language. So a macro is essentially a function that transforms those data-structures meant to be code. It doesn’t mean though that homoiconicity is prerequisite for syntactic macros. The cited MetaPython but also MetaLua are examples for macro systems in non-homoiconic languages. A nice treatment of the foundations of macro systems in non-homoiconic languages can also be found in the documentation of the Converge language.

Syntactic macros in EE 2

EasyExtend 2.0 defined a macro system in an own langlet, simply called the macro langlet. The role of macros in EasyExtend is somewhat peculiar though because EE is a system used for whole language transformations with language syntax specified in grammars. EE binds node handlers to parse tree node types and those node handlers emit parse tree nodes of the target language. They were just meant to facilitate code transformations within the framework.

Unfortunately it turned out that they didn’t. A bug in a macro caused hard to detect language transformation errors. Error recovery is a major concern if not the single major concern with language transformers and the complicated quasi-quoting / splicing semantics of macros made this even harder. More often than I ever wanted I debugged through framework when attempting to detect a bug in the macro. Furthermore the macro langlet operated within the system and was transformed as well. So two languages were transformed at the same time whereas one of them was additionally engaged in transforming the other. EasyExtend 2 made this possible without any conflicts implementing a joint transformation machinery and separated ranges of node types node handlers could bind to. The macro transformation was implemented as a set of node handlers in the `LangletTransformer` of the macro langlet and it was an ugly beast of code. I intended to rewrite this for EE 3 but it at some point of time I decided to terminate this intellectual exercise and dropped the macro facility completely.

Why not just textual macros?

A textual substitution macro occasionally also called “lexical” macro is a simple way to do code transformations. The most important programming language ever, C, implements them by means of the C preprocessor. Lexical macros are also known as poor-mans macros and using them in EasyExtend was like leaving civilization because city-life is complicated and corrupt and one better strives for a subcomplex life as a farmer or hunter-gatherer. Furthermore EE is in deep love with syntax so a radical conceptualization of the macro idea is required. Fortunately the intellectual effort for this is low and can be summarized in one sentence

Enable arbitrary string transformations guarded by syntax

This is universally applicable for all context free languages and doesn’t require any additional macro syntax.

If one attempts to transform one string `S1` into another one `S2` all that is needed are the following three operators

  • insert
  • delete
  • subst

The edit distance between S1 and S2 is the minimal number of operations of this kind used to transform `S1` into `S2`. It can be computed by the popular algorithm of Levenshtein.

EasyExtend 4 will use a slightly different set of operators

  • replicate
  • delete
  • subst

It is almost equally expressive except for the case of an empty string `S1` that requires an initial `insert` and there is nothing yet to replicate, delete or substitute. But once you have a single character in a string, one can replicate this character and apply substitutions which yields the same as applying multiple insertions.

CodeTemplate objects in EE 4

EasyExtend 4 defines a module called `csttemplate.py` that defines 4 types of objects called `CodeTemplate`, `CodeMarker`, `CodeSelection` and `SelectionTree`.

A CodeTemplate is an object that wraps some piece of source code in some language. The source code is parsed on construction of the CodeTemplate.

A CodeMarker marks sections in the source by means of search pattern. One can bind arbitrary many CodeMarker objects to a CodeTemplate. It is possible to also bind CodeMarkers to other CodeMarkers building a CodeMarker tree. When a CodeMarker B is bound to another `CodeMarker` A the search pattern of B is only applied to the code marked by A.

A CodeTemplate implements a `match`method. If it is applied the pattern specified by all bound CodeMarker objects are matched against the code of the CodeTemplate. The result is a SelectionTree object. The tree holds a list of CodeSelection objects as well as a list of subtrees.

All transformations of the source code wrapped by the CodeTemplate are applied through CodeSelection objects. It is the CodeSelection object that substitutes, replicates or deletes code that is bound to. Note that internally all those operations still happen on a parse tree. Invalid operations will immediately be rejected and an exceptions is raised. It is not a complete syntax check on the whole code that is applied on each operation and it wouldn’t make sense to perform one either because some piece of code can morphed into code of another language and it will be a hybrid piece of code except at transformations start and the endpoint.

The SelectionTree is clonable and can be reused for other substitutions.

Outlook

In the next article I’ll show CodeTemplates in action.

This entry was posted in DSL, Grammars, TBP. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

*