Keywords and Contextual Keywords
A language keyword is a reserved word that cannot be used as an identifier for variables, constants, classes etc. Often it is an element of language syntax. Syntactical structure is simplified by means of keywords. A keyword can be thought of as a “semantic marker” on many occasions i.e. it denotes a location where something particular meaningful is happening. The
`while` keyword points to a while-loop, the `class` keyword to a class definition etc. Other keywords like `as` or `in` link together expressions and have a fixed location within a statement.
A contextual keyword is sort of a hybrid. It acts much like a keyword but is not reserved. The word `as` was such a contextual keyword in Python until it became a proper keyword in version 2.5. The C# language defines lots of contextual keywords besides the regular ones. MSDN defines a contextual keyword as:
A contextual keyword is used to provide a specific meaning in the code, but it is not a reserved word [in C#].
Contextual keywords in C# are used properly in the grammar description of the language – like regular keywords. For example the `add_accessor_declaration` is defined by the following C# rule:
add_accessor_declaration: [attributes] "add" block
Keywords in EasyExtend
In EasyExtend keywords are implicitly defined by their use in a grammar file. The “add” string in the `add_accessor_declaration` becomes automatically a keyword. Technically a keyword is just an ordinary name and the tokenizer produces a `NAME` token. It’s easy to verify this: when we inspect the token stream of a function definition `def foo(): pass` we’ll notice that `def ` and `pass` are mapped onto the `NAME` token:
>>> def foo(): pass [token> ----------------------------------------------------------------. Line | Columns | Token Value | Token Name | Token Id | -------+---------+------------------+---------------+-----------+ 1 | 0-3 | 'def' | NAME | 1 -- 1 | 1 | 4-7 | 'foo' | NAME | 1 -- 1 | 1 | 7-8 | '(' | LPAR | 7 -- 7 | 1 | 8-9 | ')' | RPAR | 8 -- 8 | 1 | 9-10 | ':' | COLON | 11 -- 11 | 1 | 11-15 | 'pass' | NAME | 1 -- 1 | 1 | 15-15 | '\n' | NEWLINE | 4 -- 4 | 2 | 0-0 | '' | ENDMARKER | 0 -- 0 | ----------------------------------------------------------------'
In the parse tables keywords are preserved and we will find a state description like `(‘pass’, 1, 273)` which explicitly refer to the `pass` keyword. It will be clearly distinguished from token id’s of type `NAME` which are used otherwise ( e.g. for `foo` ). Now the `pass` token in the token-stream has precisely following structure `[1, ‘pass’, 1, (11, 15)]`. Usually the token id is all that the parser needs to know but when the parser encounters a `NAME` token the token value is a keyword the keyword is turned into a token type. We can summarize this using following function
def compute_token_type(tok):
tok_type = tok[0] # the standard token type
tok_value = tok[1]
if tok_type == token.NAME:
if tok_value in keywords:
return tok_value # tok_value is a keyword and used as a token type
return tok_type
Contextual Keywords in EasyExtend
Unlike keywords which can simply be determined from the grammar this isn’t possible with contextual keywords. They have to be made explicit elsewhere. When I considered contextual keywords in EasyExtend I refused to create special infrastructure for them but used a simple trick instead. Suppose one defines following token in a`Token.ext` file
CONTEXT_KWD: 'add' | 'remove'
This is just an ordinary token definition. When `parse_token.py` is generated from `Token`+`Token.ext` we’ll find following settings
CONTEXT_KWD = 119896
and
token_map = { ... 'add|remove': 119896, ... }
The `parse_token.py` provides obviously sufficient information to determine contextual keywords form the `CONTEXT_KWD` token definition. We exploit this in the next function definition:
def generate_contextual_kwds(parse_token):
try:
ctx_kwd = parse_token.CONTEXT_KWD
t_map = swap_dict(parse_token.token_map) # swap_dict turns dict keys
# into values and values into keys
contextual_keywords = set( t_map[ctx_kwd].split("|") )
assert contextual_keywords <= keywords
return contextual_keywords
except AttributeError:
return set()
Now we have to understand the semantics of contextual keywords. The following code illustrates the behavior of the parser in the presence of contextual keywords.
def compute_next_selection(token_stream, parse_table):
tok = token_stream.current()
tok_type = compute_token_type(tok)
selection = parse_table.selectable_ids() # provides all token ids and nonterminal ids which are
# admissible at this state
for s in selection:
if s == tok_type or tok_type in first_set(s):
return s
# here we start to deal with contextual keywords
elif tok_type in contextual_keywords:
# replace the token type of the contextual keyword by NAME
tok_type = token.NAME
for s in selection:
if s == tok_type or tok_type in first_set(s):
return s
raise ParserError
When a keyword is needed in the particular context ( tok_type was accepted in the first for-loop ) either the tok_type itself is returned or a node id of a non-terminal that derives the tok_type ( tok_type is in the `first_set` of the node id ). If this fails it is still permitted that instead of the contextual keyword the `NAME` token is examined in the same way as the keyword. So if a particular context requires a contextual keyword this keyword is provided otherwise it is checked whether the context just requires a `NAME` and the `NAME` is provided instead. This has the practical implication that a contextual keyword can be used as an identifier of a variable.