Inheritance and the C preprocessor

Posted in Algorithms, C on March 24th, 2010 by kay – 3 Comments

Defining n-ary trees using the C preprocessor

In this article I introduce a compile time C technique used to define inheritance. Instead of giving a lengthy motivation I’ll jump directly to the algorithm and discuss it later. I hope lovers of C and its preprocessor find it useful. #defines first!

#define TOP 0
#define SET_CHILD(n,parent) ( parent==TOP ? n: \
                            ( parent<(1<<4) ? (n<<4) + parent : \
                            ( parent<(1<<8) ? (n<<8) + parent : (n<<12)+parent)))
#define IS_SUBNODE(child, parent) ((child & parent) == parent)
#define SELECT(X, a, best) ( a > best && IS_SUBNODE(X, a)? a : best)
#define SELECT_FROM_5(X, a, b, c, d, e) SELECT(X, a, \
                                        SELECT(X, b, \
                                        SELECT(X, c, \
                                        SELECT(X, d, \
                                        SELECT(X, e, 0)))))
#define SELECT_FROM_4(X, a, b, c, d) SELECT_FROM_5(X, a, b, c, d, 0)
#define SELECT_FROM_3(X, a, b, c)    SELECT_FROM_5(X, a, b, c, 0, 0)
#define SELECT_FROM_2(X, a, b)       SELECT_FROM_5(X, a, b, 0, 0, 0)

The SET_CHILD macro is used to define up to 15 child nodes of a given root for a n-ary tree of depth 5 with a single root node, named TOP. This is encoded within a single number of type word which is adequate for most embedded compilers. For 32 or 64 bit processors one can either support more child nodes or a deeper tree.

SET_CHILD is assigning a name to n-th child of a given parent. One starts with TOP as the parent of all nodes and recurses down:

#define A SET_CHILD(1, TOP)
#define B SET_CHILD(2, TOP)
#define A1 SET_CHILD(1, A)
#define A2 SET_CHILD(2, A)
#define B1 SET_CHILD(1, B)
#define B2 SET_CHILD(2, B)
#define A11 SET_CHILD(1, A1)
#define A12 SET_CHILD(2, A1)
#define A21 SET_CHILD(1, A2)
#define A22 SET_CHILD(2, A2)

By construction no more than 15 child nodes for a given parent are permitted. If more are used, macros like IS_CHILD will fail to work correctly.

Once a tree is created with the appropriate nodes, one can use IS_CHILD to check for child/parent relationships. The tree is constructed s.t. IS_CHILD(A, B) returns 1 iff A is a direct child of B or a grandchild of B etc. otherwise 0. So IS_CHILD(A22, A) evaluates to 1 just like IS_CHILD(A22, A2) or IS_CHILD(A22, TOP) but IS_CHILD(A22, A1) is 0.

The C preprocessor doesn’t support overloading and the flavors I checked didn’t support varargs wich wouldn’t probably be much helpful in this case either. So I defined a group of 5 SELECT_FROM_xx macros being distinguished only be the number of arguments. The number 5 isn’t magic and one can extend the range of SELECT_FROM_xx macros by need.

How is SELECT_FROM_xx used? The first argument X is an arbitary node of the tree. If one of the susequent nodes a, b, … c is identical with X, X will be the value of SELECT_FROM_xx(X, a, b, …, c). Otherwise the most-direct-parent of X among the nodes a, …c will be returned. If none of them is a parent of X the return value is TOP.


If we set

#define X A22

then we get

SELECT_FROM_2(X, A, B)        // = A
SELECT_FROM_3(X, A, B, A1)    // = A
SELECT_FROM_3(X, A, B, A2)    // = A2
SELECT_FROM_3(X, A2, B, A)    // = A2
SELECT_FROM_3(X, A2, A, A22)  // = A2
SELECT_FROM_2(X, A1, B)       // = TOP


With the definitions above we can influence conditional compilation:

#if SELECT_FROM_3(X,A2,A,B) == A2
        const int a = 0;
#else if SELECT_FROM_3(X,A2,A,B) == A
        const int a = 1;
#else if SELECT_FROM_3(X,A2,A,B) == B
        const int a = 2;
        const int a = -1;

The virtue of the construction lies in its robustness. Suppose X is A22 then the first branch is selected but this remains true also if we build a “subclass” A22k , k = 1, …, 9, A, …, F of A22 and assign e.g.

#define X A225

So if we use conditional compilation for a given System S and create a subsystem T of S e.g. a new version of S, we have to adapt our C code only in places where T differs explicitely from S. This robustness is also the major virtue of using inheritance / polymorphism in OOP. It has led to disrespect of using case-statements in OOP since those do not exploit polymorphism and cause in turn less robust code. We see that case- or if-else statements can be confined with the very same idea and robustness even on the level of the C preprocessor. The additional charme of using the C preprocessor is that child/parent relationships are computed at compile time and do not cause any runtime performance penalty.

Refactoring the ANSI C grammar

Posted in C, Grammars on February 21st, 2009 by kay – Be the first to comment

A nice blog article by Eli Bendersky about the distinction between ASTs and CSTs.

After having clarified the concepts Eli tried to make the case for ASTs. He presents the C grammar or more precisely the part of the C grammar that deals with declarations and mentions the obscurity of the grammar that is immediately reflected in the structure of the CST. ASTs serve as some sort of cleanup. This is all true: the C grammar has obscure parts and his ASTs are cleanups but does it justify the extra effort being spent for creating and maintaining ASTs?

A grammar is just another computer program written in (E)BNF and you can strive for clean and accessible BNF programs just like you do in any other programming language. So what about refactoring? Now the ANSI C grammar might not be among the finest parts of the C language but it’s still authoritative and it’s also not easy to show that another grammar describes the same language syntax.

I’ve nevertheless attempted to rewrite the declaration part of the C grammar. There are a few omissions here. For example storage class specifiers and type qualifiers like const are missing and there are also no function declarations. It shall be easy though to add them given the syntactical description of the top level declaration rule described below. There are also just a few type names for testing purposes. Determining type names is an entirely different issue we do not want to treat in this blog post.

declaration: typename declarator
declarator: simple_declarator | func_ptr_declarator | ptr_declarator | array_declarator
simple_declarator: simple_declarator_base | '(' simple_declarator ')'
simple_declarator_base: NAME | '(' NAME ')'
ptr_declarator: ptr_declarator_base | '(' ptr_declarator ')'
ptr_declarator_base: ptr_or_ref declarator
func_ptr_declarator: func_ptr_declarator_base | '(' func_ptr_declarator ')'
func_ptr_declarator_base:  '(' ( ptr_declarator | array_declarator ) ')' arglist
array_declarator: array_declarator_base | '(' array_declarator ')'
array_declarator_base:  ( ptr_declarator | simple_declarator | func_ptr_declarator ) subscript_list      
subscript_list: ('[' [NAME | Intnumber] ']')+
ptr_or_ref: '*'+ ['&'] | '&'
arglist: '(' [parameters] ')'
parameters: declaration (',' declaration)*
typename: 'int' | 'float' | 'char' | 'void'

This grammar can be used to accurately parse following expressions:

int x;
int *x;
int (x);
int (*x);
int ((*x));
int ((*x)());
int ((*x)())[];
int ((*x)(int y))[];
int ((*x)(int y, char z[4]))[];
int ((*x)(int y, char z[4]))[][1];
char **foo [][8];
char (**foo [][8]);
char (**foo [][8])();
char ((**foo [][8])());
char *((**foo [][8])());
char (*(**foo [][8])());
char (*(**foo [][8])())[];
char *(*(**foo [][8])())[];

There is still some overhead because we have pairings like simple_declarator and simple_declarator_base. These distinctions have the only purpose of putting the basic form into parentheses.

A parse tree display for

int (*x)();

looks like this in EasyExtend:

  declaration  -- S`116994 -- 258
    typename  -- S`117008 -- 272
      NAME  -- T`116737 -- 1     L`1
    declarator  -- S`116995 -- 259
      func_ptr_declarator  -- S`117000 -- 264
        func_ptr_declarator_base  -- S`117001 -- 265
          LPAR  -- T`116743 -- 7     L`1
          ptr_declarator  -- S`116998 -- 262
            ptr_declarator_base  -- S`116999 -- 263
              ptr_or_ref  -- S`117005 -- 269
                STAR  -- T`116752 -- 16     L`1
              declarator  -- S`116995 -- 259
                simple_declarator  -- S`116996 -- 260
                  simple_declarator_base  -- S`116997 -- 261
                    NAME  -- T`116737 -- 1     L`1
          RPAR  -- T`116744 -- 8     L`1
          arglist  -- S`117006 -- 270
            LPAR  -- T`116743 -- 7     L`1
            RPAR  -- T`116744 -- 8     L`1
  SEMI  -- T`116749 -- 13     L`1

Still much clutter but now it is easy to identify the semantically relevant entities and search for them. In EasyExtend we use simple functions like find_node and find_all which keep a CST node and a node id parameter and perform a depth-first search.