The thunk_stmt and programmable semantics In Python

Seems as if there was just a single direction of syntactical improvement of Python these days which is the introduction of Ruby style blocks. Tav suggests to reuse Pythons with-statement:

with_stmt: "with" expression (["as" target] | ["do" [parameter_list]]) ":" suite

Guido responds promptly and rejects the semantic dispersion of the with-statement. No meaning changer by using just a magical `do` keyword. In Tavs vision the do-branch of the with-statement is turned into an anonymous closure that is called by a special `__do__` function. The `__do__` shall also be customizable which finally leads to DSLs:

But more enterprising __do__ functions could do interesting things like rewriting the AST of the BLOCK_CODE. Given that this is already done by many many Python libraries, it’s not much of a stretch of the imagination. The benefit? Super nice Pythonic DSLs!

Well it is quite easy to put this idea upside down. Instead of re-purposing a special purpose statement like `with` one can simply turn each expression and “small statement” e.g. raise, assert, return, assignment etc. into a compound statement using following definition

thunk_stmt: small_stmt ':' suite

One has to elaborate on it a little and change rules of Pythons existing grammar for LL(1) conformance

stmt: compound_stmt
single_input: NEWLINE | compound_stmt NEWLINE
compound_stmt: ( if_stmt | while_stmt | for_stmt | try_stmt | with_stmt | 
                 funcdef | classdef | thunk_stmt )
thunk_stmt: small_stmt ( ':' suite | (';' small_stmt)* [';'] NEWLINE )

Here the `thunk_stmt` is a fusion of the definition stated above and the `simple_stmt` of Pythons grammar. This definition doesn’t lead to programmable syntax but programmable semantics which is what Tav intends to support with a customizable `__do__`.

An obvious use case would be property definitions:

x = property():
    def get(self):
        return self._x
 
    def set(self, value):
        self._x = value

It simply appends a block of code to a valid assignment `x = property()`.

Same goes with

employees.select.do(emp)

which might be extended to

employees.select.do(emp):
    if emp.salary > developer.salary:
            return fireEmployee(emp)
    else:
        return extendContract(emp)

Without an appended block the expression

employees.select.do(emp)

might simply mean

employees.select(lambda emp: emp)

It should be clear that Python needs a protocol to support this which means here that a `__special__` function is attached to an object and that for user defined classes this `__special__` function can be overwritten.

This entry was posted in DSL, Grammars, Python. Bookmark the permalink.

4 Responses to The thunk_stmt and programmable semantics In Python

  1. Adam Olsen says:

    I think you tripped over the biggest problem in this whole idea: it only provides programmable semantics, NOT programmable syntax. Why would you expect semantics to be enough for SQL, or for something more exotic like symbolic math or image manipulation?

    Although many see the reuse of syntax as reducing the learning curve, I see that as minor or even a net loss (due to all the places it’s not the same as the host language). Instead, the biggest advantage I see is intermingling objects. If you want to pass a value into your SQL query you just do a normal-looking reference, rather than a parameterized query (or worse, escaping the string yourself).

    That fails in so many situations though. What if you want to pass in a function? The reference will be to the host language, ruining any performance benefit of using a SQL DSL in the first place. To do it correctly you need to explicitly create it as a function in the nested language, then pass that reference in.

    There’s gotta be a better way for embedding highly different and specialized languages than the temptingly-obvious syntax overloading.

  2. kay says:

    Hi Adam, hard to disagree.

    The attractiveness of the thunk_stmt lies for me in the fact that it follows a common syntactical pattern of the Python grammar. Grammars aren’t usually considered as languages in their own right but just supportive constructions like configuration files. As the language and its grammar is maturing certain regularities become visible and I do think they guide the progress of syntactical extensions. I reversed this by isolating the pattern and minimally engineered a syntactical extension while leaving it open to programmers to customize the semantics to their needs. I wouldn’t pretend though that this covers all kinds use cases.

    As far as SQL, XML and other embeddings in a host language are concerned it might be reasonable that they meet each other half way. The extension language shall be finally 1 language again and not a 1+1 solution or disjoint sum.

    I’ve done an experiment in this direction called P4D [1] which was inspired by E4X [2]. E4X defines XML literals as well as XML element/attribute access syntax for ECMA-Script languages. I liked E4X a lot when I used AS3 which is right now the E4X reference implementation. Unlike E4X I didn’t want to edit XML in Python and combining XML and Python syntactically has always been a kludge. So I advanced the idea of SLiP [3] which considers homomorphic images of XML onto blocks that roughly follow the thunk_stmt pattern. Unlike SLiP or YAML the E4X DSL is intended to be embedded and intermingled with a host language. This has also been the driving idea in P4D.

    The result is quite good IMO. I created P4D scripts in in cases where XML is used as a programming language as in Adobe Flex or MS-Xaml. An even closer connection than in those cases could possible be achieved when ASP.NET is used with IronPython. It’s just that I’m disappointed about certain performance characteristics of IronPython and won’t follow this train of thought but that’s another story.

    [1] http://www.fiber-space.de/EasyExtend/doc/p4d/p4d.html

    [2] http://en.wikipedia.org/wiki/ECMAScript_for_XML

    [3] http://slip.sourceforge.net/

  3. Adam Olsen says:

    P4D looks interesting, but I disagree on having a unified language. Although a single context works for XML, for SQL you want two contexts: local (the client, our host language) and remote (the server, our nested language).

    As for your thunk proposal, I think that’s better structured as a statement:

    make employees.select as result using emp:
        if emp.salary > developer.salary:
            return fireEmployee(emp)
        else:
            return extendContract(emp)

    where both “as result” and “using emp” are optional.

    However, I wouldn’t expect this to pass in the AST. I would expect it to be a simple callback, which in turn can easily be replaced by a decorator.

    If it is to pass in the AST why not go a step further and pass it in as text? Check for minimal amounts of indentation to delimit the suite, then let your nested language call the parser explicitly. The only disadvantage is locals and globals can’t be determined statically, but there’s a couple options there, the simplest of which is to just pass them in as arguments.

    make employees.sql as result using someglobal=someglobal:
        select where emp.salary > developer.salary blah blah blah

  4. kay says:

    P4D supports two quite different data structures: XML and a binary format for so called Bytelets. It’s not intended to be a universal container though. SQL has a phrase structure rather than a block structure. LINQ is more appropriate here than E4X.

    I do not have strong opinions about the thunk_stmt vs make_stmt as design elements. I find the thunk_stmt interesting as an idiom/design pattern so to say. One can generalize the thunk_stmt as follows:

    thunk_stmt: header ':' suite ['else' suite]

    and header might be defined as

    header: for_header | while_header | make_header | ... | small_stmt
    for_header: 'for' exprlist 'in' testlist
    while_header: 'while' test

    That’s purely conceptual.

    As far as passing of text is concerned. EasyExtend uses CSTs for which parse/unparse functions are implemented and following equation holds:

    unparse(parse(source)) = source

    A fair amount of work has been spent in creating a slick library that makes CSTs more user friendly i.e. much like handling ASTs. The difference between source text and ASTs is fading. This has further implications because syntax aware find/replace operations can be defined on source. So one has basically syntactical macros but they pretend to be simple textual substitutions. That’s work in progress and there is no implementation available yet.

Leave a Reply

Your email address will not be published. Required fields are marked *

*