Archive for February, 2009

Future !

Posted in General on February 25th, 2009 by kay – 2 Comments

Refactoring the ANSI C grammar

Posted in C, Grammars on February 21st, 2009 by kay – Be the first to comment

A nice blog article by Eli Bendersky about the distinction between ASTs and CSTs.

After having clarified the concepts Eli tried to make the case for ASTs. He presents the C grammar or more precisely the part of the C grammar that deals with declarations and mentions the obscurity of the grammar that is immediately reflected in the structure of the CST. ASTs serve as some sort of cleanup. This is all true: the C grammar has obscure parts and his ASTs are cleanups but does it justify the extra effort being spent for creating and maintaining ASTs?

A grammar is just another computer program written in (E)BNF and you can strive for clean and accessible BNF programs just like you do in any other programming language. So what about refactoring? Now the ANSI C grammar might not be among the finest parts of the C language but it’s still authoritative and it’s also not easy to show that another grammar describes the same language syntax.

I’ve nevertheless attempted to rewrite the declaration part of the C grammar. There are a few omissions here. For example storage class specifiers and type qualifiers like const are missing and there are also no function declarations. It shall be easy though to add them given the syntactical description of the top level declaration rule described below. There are also just a few type names for testing purposes. Determining type names is an entirely different issue we do not want to treat in this blog post.

declaration: typename declarator
declarator: simple_declarator | func_ptr_declarator | ptr_declarator | array_declarator
 
simple_declarator: simple_declarator_base | '(' simple_declarator ')'
simple_declarator_base: NAME | '(' NAME ')'
 
ptr_declarator: ptr_declarator_base | '(' ptr_declarator ')'
ptr_declarator_base: ptr_or_ref declarator
 
func_ptr_declarator: func_ptr_declarator_base | '(' func_ptr_declarator ')'
func_ptr_declarator_base:  '(' ( ptr_declarator | array_declarator ) ')' arglist
 
array_declarator: array_declarator_base | '(' array_declarator ')'
array_declarator_base:  ( ptr_declarator | simple_declarator | func_ptr_declarator ) subscript_list      
 
subscript_list: ('[' [NAME | Intnumber] ']')+
ptr_or_ref: '*'+ ['&'] | '&'
arglist: '(' [parameters] ')'
parameters: declaration (',' declaration)*
typename: 'int' | 'float' | 'char' | 'void'

This grammar can be used to accurately parse following expressions:

int x;
int *x;
int (x);
int (*x);
int ((*x));
int ((*x)());
int ((*x)())[];
int ((*x)(int y))[];
int ((*x)(int y, char z[4]))[];
int ((*x)(int y, char z[4]))[][1];
char **foo [][8];
char (**foo [][8]);
char (**foo [][8])();
char ((**foo [][8])());
char *((**foo [][8])());
char (*(**foo [][8])());
char (*(**foo [][8])())[];
char *(*(**foo [][8])())[];

There is still some overhead because we have pairings like simple_declarator and simple_declarator_base. These distinctions have the only purpose of putting the basic form into parentheses.

A parse tree display for

int (*x)();

looks like this in EasyExtend:

  declaration  -- S`116994 -- 258
    typename  -- S`117008 -- 272
      NAME  -- T`116737 -- 1     L`1
        int
    declarator  -- S`116995 -- 259
      func_ptr_declarator  -- S`117000 -- 264
        func_ptr_declarator_base  -- S`117001 -- 265
          LPAR  -- T`116743 -- 7     L`1
            (
          ptr_declarator  -- S`116998 -- 262
            ptr_declarator_base  -- S`116999 -- 263
              ptr_or_ref  -- S`117005 -- 269
                STAR  -- T`116752 -- 16     L`1
                  *
              declarator  -- S`116995 -- 259
                simple_declarator  -- S`116996 -- 260
                  simple_declarator_base  -- S`116997 -- 261
                    NAME  -- T`116737 -- 1     L`1
                      f
          RPAR  -- T`116744 -- 8     L`1
            )
          arglist  -- S`117006 -- 270
            LPAR  -- T`116743 -- 7     L`1
              (
            RPAR  -- T`116744 -- 8     L`1
              )
  SEMI  -- T`116749 -- 13     L`1
    ;

Still much clutter but now it is easy to identify the semantically relevant entities and search for them. In EasyExtend we use simple functions like find_node and find_all which keep a CST node and a node id parameter and perform a depth-first search.

Countdown

Posted in F, General on February 8th, 2009 by kay – Be the first to comment

The Flusser hierarchy

4 the natural man in the state of concrete spatio-temporal experience. No mind in escape.
3 the modeler and maker whose hands shape the things and brings them to rest.
2 the artist imagines the world. Eternal return and circulation on a magical surface.
1 the author linearizes the flow of things in symbols and historical time.
0 the programmer computes the it from the bit.