{"id":1419,"date":"2010-01-30T07:26:40","date_gmt":"2010-01-30T06:26:40","guid":{"rendered":"http:\/\/fiber-space.de\/wordpress\/?p=1419"},"modified":"2010-01-30T08:52:41","modified_gmt":"2010-01-30T07:52:41","slug":"syntax-algebra-first-steps","status":"publish","type":"post","link":"http:\/\/fiber-space.de\/wordpress\/2010\/01\/30\/syntax-algebra-first-steps\/","title":{"rendered":"Syntax algebra &#8211; first steps"},"content":{"rendered":"<h3>Principle of relativity<\/h3>\n<p>I started to revisit syntactic mappings defined in EasyExtend 3 which are closely tied to the Python grammar being\u00a0 in use. Those are functions like `varargs2arglist`, `normalize`, `split_file_input` or `exprlist2testlist` defined in the `csttools.py` module. One of the major challenges of future releases of EasyExtend ( or a definite successor project &#8211; i don&#8217;t know ) is to abstract from Python as a target language. In EE3 only Python can be targeted by a langlet transformation whereas in EE 4 langlets are symmetric: each langlet can be a target langlet for any other langlet. All langlets exist on the same footing and also: each langlet can be used as a parent langlet of a newly defined langlet which means a proper inheritance relationship and maximum reuse.<\/p>\n<p>The relativity among langlets calls for raising the bar of abstraction. All Python specific dependencies, and there are a lot in EE3, have to be removed from the basic CST libraries. Indeed not even nodes of special importance like `expr`, `stmt` or `atom` shall be allowed in places other than modules which are specific to a particular langlet. The generalization of the mentioned functions leads to a study of abstract syntactic forms and some sort of &#8220;syntax algebra&#8221;. It isn&#8217;t studied rigorously in this article but shall be at least motivated. As a prerequisite I recommend to read my article about<a href=\"http:\/\/fiber-space.de\/wordpress\/?p=1350\"> CST interpolation<\/a> which discusses concepts of major relevance.<\/p>\n<h3>Embeddings<\/h3>\n<p>The `exprlist2testlist` function turns a node of type `exprlist` defined as<\/p>\n<pre lang=\"bnf\">exprlist: expr (',' expr)* [',']<\/pre>\n<p>into a node of type `testlist`defined as<\/p>\n<pre lang=\"bnf\">testlist: test (',' test)* [',']<\/pre>\n<p>This works without adding information because `{test, expr}` is a valid <em>interpolation<\/em> i.e. there is a nested sequence<\/p>\n<pre>[<strong>test<\/strong>, [or_test, [and_test, [not_test, [comparison , <strong>expr<\/strong>]]]]]<\/pre>\n<p>which is a representation of a valid CST. In terms of CST interpolations `test(expr)` yields a node of type `test` and induces a homomorphism `exprlist`-&gt;`testlist`. More generally an interpolation `{A, B}` induces an embedding `B (x B)*` -&gt; `{A,B} (y {A,B})* = A (y A)*` if `x` and `y` are <em>constant<\/em> terminal symbols i.e. terminal symbols where the corresponding token have a uniquely determined token string.<\/p>\n<h3>Blocks or no blocks<\/h3>\n<p>Another relevant example is the `normalize` function. The idea behind `normalize` is that statements like `if x: y` or `def foo(): pass` are semantically equivalent to block statements:<\/p>\n<pre lang=\"python\">if x:\r\n    y<\/pre>\n<pre lang=\"python\">def foo():\r\n    pass<\/pre>\n<p>Those block statements can be used in a more general fashion because we can add other block statements in the thunked block. In Python blocks are expressed by the `suite` grammar rule:<\/p>\n<pre lang=\"bnf\">suite: simple_stmt | NEWLINE INDENT stmt+ DEDENT<\/pre>\n<p>Since `{stmt, simple_stmt}` is a valid interpolation, we can substitute all occurences of<\/p>\n<pre lang=\"bnf\">suite: simple_stmt<\/pre>\n<p>with suites of the form<\/p>\n<pre lang=\"bnf\">suite: NEWLINE INDENT stmt+ DEDENT<\/pre>\n<p>The general syntactic embedding is of the kind<\/p>\n<p>`B` -&gt; `a b&#8230; {A,B} c d&#8230;` with a, b, c, d &#8230; being constant terminal symbols.<\/p>\n<p>Notice that `INDENT` is assumed to be a constant terminal despite the fact that the token string may vary. `INDENT` is treated special because in practical applications the number of used spaces for an indentation is fixed.\u00a0 EasyExtend always uses 4 spaces.<\/p>\n<h3>Splitting nodes<\/h3>\n<p>The function `split_file_input` is the prototype of a node splitting function which can be thought in analogy to string splits. In this particular case we have a node `file_input` of the form<\/p>\n<pre lang=\"bnf\">file_input: (NEWLINE | stmt)* ENDMARKER<\/pre>\n<p>and want to generate a sequence of nodes `file_input: stmt ENDMARKER`, `file_input: NEWLINE ENDMARKER` and `file_input: ENDMARKER` &#8211; one for each `NEWLINE`, `stmt` and `ENDMARKER` in the original `file_input` node. It doesn&#8217;t matter that `file_input: NEWLINE ENDMARKER` and `file_input: ENDMARKER` are likely be thrown away by an application because this can be decided by the nodes consuming function. The general split function is defined by<\/p>\n<p>`R: x (A|B&#8230;)* y` -&gt; `[(R: x A y), (R: x B y), &#8230;, (R: x y)]`<\/p>\n<h3>Node rewritings<\/h3>\n<p>The mappings considered above were all rather straightforward. Now we want to discuss a rule transformation which is less obvious, namely that of a function signature into an argument tuple of a function call. In Python 2.6 the function signature is defined as<\/p>\n<pre lang=\"bnf\">varargslist: ((fpdef ['=' test] ',')*\r\n('*' NAME [',' '**' NAME] | '**' NAME) |\r\nfpdef ['=' test] (',' fpdef ['=' test])* [','])\r\nfpdef: NAME | '(' fplist ')'\r\nfplist: fpdef (',' fpdef)* [',']<\/pre>\n<p>and an argument list of a function call by<\/p>\n<pre lang=\"bnf\">arglist: (argument ',')* (argument [',']| '*' test [',' '**' test] | '**' test)\r\nargument: test [gen_for] | test '=' test<\/pre>\n<p>Can we transform each `varargslist` into an `arglist`?<\/p>\n<p>Let&#8217;s start our treatment of varargslist with `fpdef`. If we insert the RHS of `fplist` in `fpdef` we get<\/p>\n<pre lang=\"bnf\">fpdef: NAME | '(' fpdef (',' fpdef)* [','] ')'<\/pre>\n<p>We show that this rule is a special form of\u00a0 the node `atom` and since `{test, atom}` is a valid interpolation it is also a `test` node. The `atom` node is defined by<\/p>\n<pre lang=\"bnf\">atom: NAME | '(' [yield_expr|testlist_gexp] ')' |\u00a0 '[' [listmaker] ']' | ...<\/pre>\n<p>which can be specialized to<\/p>\n<pre lang=\"bnf\">atom: NAME | '(' testlist_gexp ')'<\/pre>\n<p>Next we consider the `testlist_gexp` definition<\/p>\n<pre lang=\"bnf\">testlist_gexp: test ( gen_for | (',' test)* [','] )<\/pre>\n<p>which can be specialized to<\/p>\n<pre lang=\"bnf\">testlist_gexp: test (',' test)* [',']<\/pre>\n<p>We insert `testlist_gexp` in `atom` which yields<\/p>\n<pre lang=\"bnf\">atom: NAME | '(' test (',' test)* [','] ')'<\/pre>\n<p>If we reduce `test` to `atom` we get a rule<\/p>\n<pre lang=\"bnf\">atom: NAME | '(' atom (',' atom)* [','] ')'<\/pre>\n<p>which is isomorphic to `fpdef`. So we just need to substitute all occurrences of `fpdef` in `fpdef` with `atom`, then replace `atom` with `test(atom)` and finally replace the whole of `atom` again with `test(atom)`. This procedure substitutes `fpdef` with `test`.<\/p>\n<p>When we substitute each occurrence of `NAME` with `test`in `varargslist` we get:<\/p>\n<pre lang=\"bnf\">(test ['=' test] ',')* ('*' test [',' '**' test] | '**' test) |\r\n                       test ['=' test] (',' test ['=' test])* [',']<\/pre>\n<p>which can be reduced to<\/p>\n<pre lang=\"bfn\">(argument ',')* ('*' test [',' '**' test] | '**' test) |\r\n                 argument (',' argument)* [',']<\/pre>\n<p>which is the same as<\/p>\n<pre lang=\"bfn\">(argument ',')* (argument [','] | '*' test [',' '**' test] | '**' test)<\/pre>\n<p>Voil\u00e0!<\/p>\n<h3>Syntax algebra<\/h3>\n<p>We have done some informal steps into syntax algebra with some real functions defined in EE 3 as a starting point. For the first three functions we have found general syntactical transformations which might be universally applicable. The last transformation is very specific though and it might be more interesting to determine an algorithm used to <em>find<\/em> a rule transformation of a particular kind. Although the search algorithm might be NP complete I assume that the found transformation &#8211; if one exists &#8211; has linear time complexity which is what we want. Such an algorithm would be another great achievement of EasyExtend which does not cease to surprise me.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Principle of relativity I started to revisit syntactic mappings defined in EasyExtend 3 which are closely tied to the Python grammar being\u00a0 in use. Those are functions like `varargs2arglist`, `normalize`, `split_file_input` or `exprlist2testlist` defined in the `csttools.py` module. One of &hellip; <a href=\"http:\/\/fiber-space.de\/wordpress\/2010\/01\/30\/syntax-algebra-first-steps\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[30,12],"tags":[],"_links":{"self":[{"href":"http:\/\/fiber-space.de\/wordpress\/wp-json\/wp\/v2\/posts\/1419"}],"collection":[{"href":"http:\/\/fiber-space.de\/wordpress\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/fiber-space.de\/wordpress\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/fiber-space.de\/wordpress\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/fiber-space.de\/wordpress\/wp-json\/wp\/v2\/comments?post=1419"}],"version-history":[{"count":18,"href":"http:\/\/fiber-space.de\/wordpress\/wp-json\/wp\/v2\/posts\/1419\/revisions"}],"predecessor-version":[{"id":1437,"href":"http:\/\/fiber-space.de\/wordpress\/wp-json\/wp\/v2\/posts\/1419\/revisions\/1437"}],"wp:attachment":[{"href":"http:\/\/fiber-space.de\/wordpress\/wp-json\/wp\/v2\/media?parent=1419"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/fiber-space.de\/wordpress\/wp-json\/wp\/v2\/categories?post=1419"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/fiber-space.de\/wordpress\/wp-json\/wp\/v2\/tags?post=1419"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}