Contextual Keywords

Keywords and Contextual Keywords

A language keyword is a reserved word that cannot be used as an identifier for variables, constants, classes etc. Often it is an element of language syntax. Syntactical structure is simplified by means of keywords. A keyword can be thought of as a “semantic marker” on many occasions i.e. it denotes a location where something particular meaningful is happening. The
`while` keyword points to a while-loop, the `class` keyword to a class definition etc. Other keywords like `as` or `in` link together expressions and have a fixed location within a statement.

A contextual keyword is sort of a hybrid. It acts much like a keyword but is not reserved. The word `as` was such a contextual keyword in Python until it became a proper keyword in version 2.5. The C# language defines lots of contextual keywords besides the regular ones. MSDN defines a contextual keyword as:

A contextual keyword is used to provide a specific meaning in the code, but it is not a reserved word [in C#].

Contextual keywords in C# are used properly in the grammar description of the language – like regular keywords. For example the `add_accessor_declaration` is defined by the following C# rule:

add_accessor_declaration: [attributes] "add" block

Keywords in EasyExtend

In EasyExtend keywords are implicitly defined by their use in a grammar file. The “add” string in the `add_accessor_declaration` becomes automatically a keyword. Technically a keyword is just an ordinary name and the tokenizer produces a `NAME` token. It’s easy to verify this: when we inspect the token stream of a function definition `def foo(): pass` we’ll notice that `def ` and `pass` are mapped onto the `NAME` token:

 >>> def foo(): pass
 Line  | Columns | Token Value      | Token Name    | Token Id  |
 1     | 0-3     | 'def'            | NAME          | 1 -- 1    |
 1     | 4-7     | 'foo'            | NAME          | 1 -- 1    |
 1     | 7-8     | '('              | LPAR          | 7 -- 7    |
 1     | 8-9     | ')'              | RPAR          | 8 -- 8    |
 1     | 9-10    | ':'              | COLON         | 11 -- 11  |
 1     | 11-15   | 'pass'           | NAME          | 1 -- 1    |
 1     | 15-15   | '\n'             | NEWLINE       | 4 -- 4    |
 2     | 0-0     | ''               | ENDMARKER     | 0 -- 0    |

In the parse tables keywords are preserved and we will find a state description like `(‘pass’, 1, 273)` which explicitly refer to the `pass` keyword. It will be clearly distinguished from token id’s of type `NAME` which are used otherwise ( e.g. for `foo` ). Now the `pass` token in the token-stream has precisely following structure `[1, ‘pass’, 1, (11, 15)]`. Usually the token id is all that the parser needs to know but when the parser encounters a `NAME` token the token value is a keyword the keyword is turned into a token type. We can summarize this using following function

def compute_token_type(tok):
    tok_type  = tok[0]      # the standard token type
    tok_value = tok[1]
    if tok_type == token.NAME:
        if tok_value in keywords:
            return tok_value    # tok_value is a keyword and used as a token type
    return tok_type

Contextual Keywords in EasyExtend

Unlike keywords which can simply be determined from the grammar this isn’t possible with contextual keywords. They have to be made explicit elsewhere. When I considered contextual keywords in EasyExtend I refused to create special infrastructure for them but used a simple trick instead. Suppose one defines following token in a`Token.ext` file

    CONTEXT_KWD: 'add' | 'remove'

This is just an ordinary token definition. When `` is generated from `Token`+`Token.ext` we’ll find following settings

    CONTEXT_KWD = 119896


    token_map = {
        'add|remove': 119896,

The `` provides obviously sufficient information to determine contextual keywords form the `CONTEXT_KWD` token definition. We exploit this in the next function definition:

def generate_contextual_kwds(parse_token):
        ctx_kwd = parse_token.CONTEXT_KWD
        t_map   = swap_dict(parse_token.token_map)      # swap_dict turns dict keys
                                                        # into values and values into keys
        contextual_keywords = set( t_map[ctx_kwd].split("|") )
        assert contextual_keywords <= keywords
        return contextual_keywords
    except AttributeError:
        return set()

Now we have to understand the semantics of contextual keywords. The following code illustrates the behavior of the parser in the presence of contextual keywords.

def compute_next_selection(token_stream, parse_table):
    tok = token_stream.current()
    tok_type = compute_token_type(tok)
    selection = parse_table.selectable_ids()   # provides all token ids and nonterminal ids which are
                                               # admissible at this state
    for s in selection:
        if s == tok_type or tok_type in first_set(s):
            return s
    # here we start to deal with contextual keywords
    elif tok_type in contextual_keywords:
        # replace the token type of the contextual keyword by NAME
        tok_type = token.NAME
        for s in selection:
            if s == tok_type or tok_type in first_set(s):
                return s
    raise ParserError

When a keyword is needed in the particular context ( tok_type was accepted in the first for-loop ) either the tok_type itself is returned or a node id of a non-terminal that derives the tok_type ( tok_type is in the `first_set` of the node id ). If this fails it is still permitted that instead of the contextual keyword the `NAME` token is examined in the same way as the keyword. So if a particular context requires a contextual keyword this keyword is provided otherwise it is checked whether the context just requires a `NAME` and the `NAME` is provided instead. This has the practical implication that a contextual keyword can be used as an identifier of a variable.

This entry was posted in csharp, EasyExtend, Grammars, Python. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *