{"id":31,"date":"2009-02-21T03:33:01","date_gmt":"2009-02-21T02:33:01","guid":{"rendered":"http:\/\/fiber-space.de\/wordpress\/?p=31"},"modified":"2009-02-21T03:46:37","modified_gmt":"2009-02-21T02:46:37","slug":"refactoring-the-ansi-c-grammar","status":"publish","type":"post","link":"http:\/\/fiber-space.de\/wordpress\/2009\/02\/21\/refactoring-the-ansi-c-grammar\/","title":{"rendered":"Refactoring the ANSI C grammar"},"content":{"rendered":"<p>A nice blog article by Eli Bendersky about the distinction between <a href=\"http:\/\/eli.thegreenplace.net\/2009\/02\/16\/abstract-vs-concrete-syntax-trees\/\" target=\"_blank\">ASTs and CSTs<\/a>.<\/p>\n<p>After having clarified the concepts Eli tried to make the case for ASTs. He presents the C grammar or more precisely the part of the C grammar that deals with declarations and mentions the obscurity of the grammar that is immediately reflected in the structure of the CST. ASTs serve as some sort of cleanup. This is all true: the C grammar has obscure parts and his ASTs are cleanups but does it justify the extra effort being spent for creating and maintaining ASTs?<\/p>\n<p>A grammar is just another computer program written in (E)BNF and you can strive for clean and accessible BNF programs just like you do in any other programming language. So what about refactoring? Now the <a href=\"http:\/\/www.lysator.liu.se\/c\/ANSI-C-grammar-y.html\">ANSI C<\/a> grammar might not be among the finest parts of the C language but it&#8217;s still authoritative and it&#8217;s also not easy to show that another grammar describes the same language syntax.<\/p>\n<p>I&#8217;ve nevertheless attempted to rewrite the declaration part of the C grammar. There are a few omissions here. For example storage class specifiers and type qualifiers like `const` are missing and there are also no function declarations. It shall be easy though to add them given the syntactical description of the top level `declaration` rule described below. There are also just a few type names for testing purposes. Determining type names is an entirely different issue we do not want to treat in this blog post.<\/p>\n<pre lang=\"bnf\">declaration: typename declarator\r\ndeclarator: simple_declarator | func_ptr_declarator | ptr_declarator | array_declarator\r\n\r\nsimple_declarator: simple_declarator_base | '(' simple_declarator ')'\r\nsimple_declarator_base: NAME | '(' NAME ')'\r\n\r\nptr_declarator: ptr_declarator_base | '(' ptr_declarator ')'\r\nptr_declarator_base: ptr_or_ref declarator\r\n\r\nfunc_ptr_declarator: func_ptr_declarator_base | '(' func_ptr_declarator ')'\r\nfunc_ptr_declarator_base:  '(' ( ptr_declarator | array_declarator ) ')' arglist\r\n\r\narray_declarator: array_declarator_base | '(' array_declarator ')'\r\narray_declarator_base:  ( ptr_declarator | simple_declarator | func_ptr_declarator ) subscript_list      \r\n\r\nsubscript_list: ('[' [NAME | Intnumber] ']')+\r\nptr_or_ref: '*'+ ['&amp;'] | '&amp;'\r\narglist: '(' [parameters] ')'\r\nparameters: declaration (',' declaration)*\r\ntypename: 'int' | 'float' | 'char' | 'void'<\/pre>\n<p>This grammar can be used to accurately parse following expressions:<\/p>\n<pre lang=\"C\">int x;\r\nint *x;\r\nint (x);\r\nint (*x);\r\nint ((*x));\r\nint ((*x)());\r\nint ((*x)())[];\r\nint ((*x)(int y))[];\r\nint ((*x)(int y, char z[4]))[];\r\nint ((*x)(int y, char z[4]))[][1];\r\nchar **foo [][8];\r\nchar (**foo [][8]);\r\nchar (**foo [][8])();\r\nchar ((**foo [][8])());\r\nchar *((**foo [][8])());\r\nchar (*(**foo [][8])());\r\nchar (*(**foo [][8])())[];\r\nchar *(*(**foo [][8])())[];<\/pre>\n<p>There is still some overhead because we have pairings like `simple_declarator` and `simple_declarator_base`. These distinctions have the only purpose of putting the basic form into parentheses.<\/p>\n<p>A parse tree display for<\/p>\n<pre lang=\"C\">int (*x)();<\/pre>\n<p>looks like this in EasyExtend:<\/p>\n<pre lang=\"C\">  declaration  -- S`116994 -- 258\r\n    typename  -- S`117008 -- 272\r\n      NAME  -- T`116737 -- 1     L`1\r\n        int\r\n    declarator  -- S`116995 -- 259\r\n      func_ptr_declarator  -- S`117000 -- 264\r\n        func_ptr_declarator_base  -- S`117001 -- 265\r\n          LPAR  -- T`116743 -- 7     L`1\r\n            (\r\n          ptr_declarator  -- S`116998 -- 262\r\n            ptr_declarator_base  -- S`116999 -- 263\r\n              ptr_or_ref  -- S`117005 -- 269\r\n                STAR  -- T`116752 -- 16     L`1\r\n                  *\r\n              declarator  -- S`116995 -- 259\r\n                simple_declarator  -- S`116996 -- 260\r\n                  simple_declarator_base  -- S`116997 -- 261\r\n                    NAME  -- T`116737 -- 1     L`1\r\n                      f\r\n          RPAR  -- T`116744 -- 8     L`1\r\n            )\r\n          arglist  -- S`117006 -- 270\r\n            LPAR  -- T`116743 -- 7     L`1\r\n              (\r\n            RPAR  -- T`116744 -- 8     L`1\r\n              )\r\n  SEMI  -- T`116749 -- 13     L`1\r\n    ;<\/pre>\n<p>Still much clutter but now it is easy to identify the semantically relevant entities and search for them. In EasyExtend we use simple functions like `find_node` and `find_all` which keep a CST node and a `node id` parameter and perform a depth-first search.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A nice blog article by Eli Bendersky about the distinction between ASTs and CSTs. After having clarified the concepts Eli tried to make the case for ASTs. He presents the C grammar or more precisely the part of the C &hellip; <a href=\"http:\/\/fiber-space.de\/wordpress\/2009\/02\/21\/refactoring-the-ansi-c-grammar\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[15,12],"tags":[],"_links":{"self":[{"href":"http:\/\/fiber-space.de\/wordpress\/wp-json\/wp\/v2\/posts\/31"}],"collection":[{"href":"http:\/\/fiber-space.de\/wordpress\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/fiber-space.de\/wordpress\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/fiber-space.de\/wordpress\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/fiber-space.de\/wordpress\/wp-json\/wp\/v2\/comments?post=31"}],"version-history":[{"count":68,"href":"http:\/\/fiber-space.de\/wordpress\/wp-json\/wp\/v2\/posts\/31\/revisions"}],"predecessor-version":[{"id":113,"href":"http:\/\/fiber-space.de\/wordpress\/wp-json\/wp\/v2\/posts\/31\/revisions\/113"}],"wp:attachment":[{"href":"http:\/\/fiber-space.de\/wordpress\/wp-json\/wp\/v2\/media?parent=31"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/fiber-space.de\/wordpress\/wp-json\/wp\/v2\/categories?post=31"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/fiber-space.de\/wordpress\/wp-json\/wp\/v2\/tags?post=31"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}