Contextual Keywords
Posted in EasyExtend, Grammars, Python, csharp on March 22nd, 2009 by kay – Be the first to commentKeywords and Contextual Keywords
A language keyword is a reserved word that cannot be used as an identifier for variables, constants, classes etc. Often it is an element of language syntax. Syntactical structure is simplified by means of keywords. A keyword can be thought of as a “semantic marker” on many occasions i.e. it denotes a location where something particular meaningful is happening. The while keyword points to a while-loop, the class keyword to a class definition etc. Other keywords like as or in link together expressions and have a fixed location within a statement.
A contextual keyword is sort of a hybrid. It acts much like a keyword but is not reserved. The word as was such a contextual keyword in Python until it became a proper keyword in version 2.5. The C# language defines lots of contextual keywords besides the regular ones. MSDN defines a contextual keyword as:
A contextual keyword is used to provide a specific meaning in the code, but it is not a reserved word [in C#].
Contextual keywords in C# are used properly in the grammar description of the language - like regular keywords. For example the add_accessor_declaration is defined by the following C# rule:
add_accessor_declaration: [attributes] "add" block
Keywords in EasyExtend
In EasyExtend keywords are implicitly defined by their use in a grammar file. The “add” string in the add_accessor_declaration becomes automatically a keyword. Technically a keyword is just an ordinary name and the tokenizer produces a NAME token. It’s easy to verify this: when we inspect the token stream of a function definition def foo(): pass we’ll notice that def and pass are mapped onto the NAME token:
>>> def foo(): pass [token> ----------------------------------------------------------------------. Line | Columns | Token Value | Token Name | Token Id | -------+---------+---------------------+---------------+--------------+ 1 | 0-3 | 'def' | NAME | 1 -- 1 | 1 | 4-7 | 'foo' | NAME | 1 -- 1 | 1 | 7-8 | '(' | LPAR | 7 -- 7 | 1 | 8-9 | ')' | RPAR | 8 -- 8 | 1 | 9-10 | ':' | COLON | 11 -- 11 | 1 | 11-15 | 'pass' | NAME | 1 -- 1 | 1 | 15-15 | '\n' | NEWLINE | 4 -- 4 | 2 | 0-0 | '' | ENDMARKER | 0 -- 0 | ----------------------------------------------------------------------' >>
In the parse tables keywords are preserved and we will find a state description like (’pass’, 1, 273) which explicitly refer to the pass keyword. It will be clearly distinguished from token id’s of type NAME which are used otherwise ( e.g. for foo ). Now the pass token in the token-stream has precisely following structure [1, 'pass', 1, (11, 15)]. Usually the token id is all that the parser needs to know but when the parser encounters a NAME token the token value is a keyword the keyword is turned into a token type. We can summarize this using following function
def compute_token_type(tok): tok_type = tok[0] # the standard token type tok_value = tok[1] if tok_type == token.NAME: if tok_value in keywords: return tok_value # tok_value is a keyword and used as a token type return tok_type
Contextual Keywords in EasyExtend
Unlike keywords which can simply be determined from the grammar this isn’t possible with contextual keywords. They have to be made explicit elsewhere. When I considered contextual keywords in EasyExtend I refused to create special infrastructure for them but used a simple trick instead. Suppose one defines following token in aToken.ext file
CONTEXT_KWD: 'add' | 'remove'
This is just an ordinary token definition. When parse_token.py is generated from Token+Token.ext we’ll find following settings
CONTEXT_KWD = 119896and
token_map = { ... 'add|remove': 119896, ... }
The parse_token.py provides obviously sufficient information to determine contextual keywords form the CONTEXT_KWD token definition. We exploit this in the next function definition:
def generate_contextual_kwds(parse_token): try: ctx_kwd = parse_token.CONTEXT_KWD t_map = swap_dict(parse_token.token_map) # swap_dict turns dict keys # into values and values into keys contextual_keywords = set( t_map[ctx_kwd].split("|") ) assert contextual_keywords <= keywords return contextual_keywords except AttributeError: return set()
Now we have to understand the semantics of contextual keywords. The following code illustrates the behavior of the parser in the presence of contextual keywords.
def compute_next_selection(token_stream, parse_table): tok = token_stream.current() tok_type = compute_token_type(tok) selection = parse_table.selectable_ids() # provides all token ids and nonterminal ids which are # admissible at this state for s in selection: if s == tok_type or tok_type in first_set(s): return s # here we start to deal with contextual keywords elif tok_type in contextual_keywords: # replace the token type of the contextual keyword by NAME tok_type = token.NAME for s in selection: if s == tok_type or tok_type in first_set(s): return s raise ParserError
When a keyword is needed in the particular context ( tok_type was accepted in the first for-loop ) either the tok_type itself is returned or a node id of a non-terminal that derives the tok_type ( tok_type is in the first_set of the node id ). If this fails it is still permitted that instead of the contextual keyword the NAME token is examined in the same way as the keyword. So if a particular context requires a contextual keyword this keyword is provided otherwise it is checked whether the context just requires a NAME and the NAME is provided instead. This has the practical implication that a contextual keyword can be used as an identifier of a variable.