T3Table ======= Introduction ------------ A *T3Table* is a composite data type which is a crossover between a C-struct, a formal grammar and a spreadsheet. As we progress we can add even more dimensions to its description. Since a T3Table looks like a data type which swallowed a whole framework, a complex assemblage, it might be surprising that it comes out quite lean. This is because a T3Table is special purpose and only some of the aspects of the mentioned data structures are reflected in its making. T3Tables are designed to handle structured binary data like `TLVs `_, `ATRs `_, `TCP Headers `_ and so on. A correctly defined T3Table can represent any of those structures and act as a *context sensitive* parser on flat binary data. What makes T3Tables special is that the *parse trees* are also T3Tables. Parsing becomes an act of self-reproduction. A T3Table clones itself with special data. Unparsing a T3Table yields another Hex number. This makes T3Tables ideally suited to control variation on data and the production of test data. About this document ~~~~~~~~~~~~~~~~~~~ * In the 1st section we give a detailed description of the T3Table, its methods and operators. * In the 2nd section we add the T3Row and T3Binding classes to the picture. * In the 3rd section two subclasses of T3Table are introduced, which are T3Bitmap and T3Set. * In the final section we take a closer look on the design of T3Tables. The T3Table class ----------------- Methods ~~~~~~~ .. function :: T3Table() A T3Table is created argument-less. .. _add: .. function :: T3Table.add(pattern = 0, **field) The ``add`` method is used to add one row to the T3Table. The data passed to ``add`` must suffice to create a T3Row object. Variants are: 1. ``t.add(R)`` with ``type(R) = T3Row``. This is possibly the most obvious construction. It is convenient to create a T3Row less explicitly. 2. ``t.add(P, s = V)``. This passes a pattern ``P`` and a key-value pair ``s = V``. The name ``s`` becomes the name of the row and we can access the row ``s`` using the notation ``t.s``. Both ``P`` and ``V`` can be of a variatey of types: 1) **Types of row-value V** a) T3Binding T3Bindings will be studied in greater detail below. A T3Binding is *not* a row-value but a *ValueBinding*. A ValueBinding is a callable that produces a row-value once a function or operator tries to access the row-value. b) T3Table T3Tables can be row-values. This allows for nested T3Tables. c) NoneType None is a special value and considered :ref:`below ` d) Other types The :meth:`T3Table._coerce` method is applied to the input data. 2) **Types of pattern P** a) T3PatternObject Pattern objects of this kind are defined in the module ``t3.pattern.py``. b) T3Table A T3Table implements the ``t3.pattern.T3Matcher`` interface and can therefore be a pattern! This can be interesting for subclasses of T3Table such as T3Bitmap. c) int An integer ``k`` is turned into ``T3PatternWildcard(k)``. If ``data`` implements ``__getitem__`` this matches ``data[:k]``. d) str, unicode Objects of type str or unicode are parsed into values of type ``T3PatternObject``. e) T3Number With ``k = int(P)`` the argument is used like an ``int`` type. f) Callable A callable must have the signature ``(T3Table, T3Number) -> T`` where T is one of type a) - e). At :meth:`T3Table.match` we take a closer look at pattern matching. 3. ``t.add(x = V)``. In this form the pattern is omitted and the default value 0 for it is used. The 0 value will be wrapped into the pattern ``T3PatternWildcard(0)`` which is the pattern that matches 0 digits of input data when ``t.match(data)`` is applied. The effect of the 0-pattern is the following: when ``t.match(data)`` is called which creates a clone of ``t``, say ``u`` then ``u.x = None``. .. note :: There must at least one T3Row in a T3Table that matches. Otherwise ``t.match(data)`` raises a ``MatchingFailure`` exception. .. function :: T3Table.match(data) The ``match`` function creates a new T3Table using input ``data`` and the pattern defined for the T3Rows of the table. **Example** :: >>> t = T3Table().add(1, s = 0).add(1, t = 0) >>> m = t.match("89 56") >>> t2 = m.value t2: s: 78 t: AF The inner working of ``t.match(data)`` can be illustrated by the following simplified algorithm :: def match(t, data): m = T3Match(T3Number.NULL, data) table = copy(t) for row in table._rows: m = row.match(m.rest) if not m: raise MatchingFailure(m) m.value = table return m Each row matches a piece of the data object, produces a T3Match return value and continues with the unmatched rest. If a T3Row fails to match the computation will be cancelled and an MatchingFailure exception is raised. Otherwise a copy of the input T3Table storing the values of the match is attached to the T3Match object which will be returned. .. _coerce: .. function :: T3Table._coerce(rowvalue) A T3Table class embodies a fixed default row value type. Types such as integers or strings might then be converted into that type on row value assignment or on other occasions. Known default row value types are * Hex -- T3Table * Bin -- T3Bitmap Overwrite ``_coerce`` in subclasses when needed. .. function :: T3Table.find(rowname) This function is used to find a row with a given name. It is a breadth first search method i.e. it looks for a row name on a given axis and recurses into a sub-T3Table on failure. The function returns the row-value of the found, None otherwise. **Example** :: >>> t1 = T3Table().add(s = 1).add(r = 1) >>> t2 = T3Table().add(t = t1).add(r = 2) >>> t2 t2: t: s: 01 r: 01 r: 02 >>> t2.find("r") 02 >>> t2.find("s") 01 .. function :: T3Table.get_value() ``t.get_value()`` concatenates the values of the T3Rows of ``t`` except of those which are None and returns that concatenation. Often ``get_value()`` is used implicitly and one writes ``Hex(t)`` instead. **Example** :: >>> t = T3Table().add(s = 1).add(r = 2) >>> t t: s: 01 r: 02 >>> Hex(t) 01 02 .. _operators: Operators ~~~~~~~~~ The operators used on T3Tables are .. |ref1| replace:: :ref:`(1)` .. |ref2| replace:: :ref:`(2)` .. |ref3| replace:: :ref:`(3)` .. |ref4| replace:: :ref:`(4)` .. |ref5| replace:: :ref:`(5)` +--------------------+------------------------------------------+--------+ | Operation | Result | Notes | +====================+==========================================+========+ | ``t[s]`` | T3Row *s* of *t* | |ref1| | +--------------------+------------------------------------------+--------+ | ``t.s`` | value of T3Row *s* of *t* | |ref2| | +--------------------+------------------------------------------+--------+ | ``t.s = v`` | substitute value of T3Row *s* | |ref3| | | | of *t* with new value *v* | | +--------------------+------------------------------------------+--------+ | ``s in t`` | True if *s* a valid T3Row name in *t*, | | | | False otherwise | | +--------------------+------------------------------------------+--------+ | ``t1 // t2`` | new T3Table which is the concatenation | | | | of *t1* and *t2* | | +--------------------+------------------------------------------+--------+ | ``t << data`` | parses *data* using the definition of *t*| |ref4| | +--------------------+------------------------------------------+--------+ | ``len(t)`` | number of rows of *t* | | +--------------------+------------------------------------------+--------+ | ``copy(t)`` | a copy of *t*. The rows of *t* | | | | are also copied | | +--------------------+------------------------------------------+--------+ | ``iter(t)`` | an iterator over the rows of *t* | | +--------------------+------------------------------------------+--------+ | ``t(x = a, ...)`` | a copy of *t* with row value | |ref5| | | | substitutions *copy(t).x = a* | | +--------------------+------------------------------------------+--------+ .. raw:: html
**Notes** : .. _operator_note_1: 1) a) For convenience a ``T3Row`` implements a subset of the ``list`` type protocol in particular the methods * __len__ * __getitem__ * __iter__ This means that a single ``T3Row`` can be treated as a 1-element list of ``T3Rows``. So ``t[s][0]`` or for ``for row in t[s]: ...`` can be applied even if ``t[s]`` is of type ``T3Row``. **Example** :: >>> t = T3Table().add(s = 0).add(s = 1).add(r = 2) >>> for row in t["r"]: ... print row ... >>> for row in t["s"]: ... print row ... b) An ``AttributeError`` is raised if ``s`` is not a valid name of a row in ``t``. .. _operator_note_2: 2) a) If ``s`` is the name of a T3Row of ``t`` then ``t.s`` is the value ``t[s].get_value()`` if ``t[s]`` is a T3Row. If otherwise ``t[s]`` is a list of T3Rows the list ``[r.get_value() for r in t[s]]`` is returned. .. note :: Use the comprehension ``[r.get_value() for r in t[s]]`` if you are in doubt about the cardinality of the T3Row with name *s*. This way you can avoid to deal with variants. Admittedly I haven't found an elegant solution to this API puzzle. Row/Value access is optimized for a 1-1 relationship between rows and names which is also quite the norm. **Example** :: >>> t = T3Table().add(s = 0).add(s = 1).add(r = 2) >>> t.r 02 >>> t.s [00, 01] b) An ``AttributeError`` is raised if ``s`` is not a valid name of a row in ``t``. .. _operator_note_3: 3) a) :ref:None assignments are discussed below. b) Unpacking assignments are required for multiple rows with the same name :: >>> t = T3Table().add(s = 0).add(s = 1).add(r = 2) >>> t.s [00, 01] >>> t.s = [7, 8] >>> t.s [07, 08] Assigning a wrong number of arguments results in a ``ValueError`` :: >>> t.s = [0] Traceback (most recent call last): File "", line 1, in ValueError: need more than 1 value to unpack. 2 expected .. _operator_note_4: 4) The operator ``t << data`` always returns either a new T3Table as a "parse tree" or raises a ``MatchingFailure`` exception with a ``T3MatchFail`` object as the exception value. .. _operator_note_5: 5) On the surface ``copy(t)`` is redundant and can be replaced with ``t()`` which also produces a copy of ``t``. But ``t()`` is actually very different underneath. The reason for this is that a T3Table ``t2`` can be the value of a T3Row ``t1`` and we'd like to create a copy of ``t1`` with some row values modified in ``t2`` :: t1.t2(x = a, y = b) # this should create a copy of t1 with two changes in t2 Each T3Table has a parent and for ``t1`` and ``t2`` the following assumptions are true :: assert t2._parent == t1 assert t1._parent == None So when ``t2(x = a, y = b)`` is evaluated what is actually copied is the *root* of the tree in which ``t2`` is a node :: def copy_root(self): if self._parent is None: return copy(self) else: return self._parent.copy_root() Copying the root of ``t2`` will also copy ``t2``. Finally the changes to ``copy(t2)`` in ``x`` and ``y`` are performed just as expected. .. _special_assignments: Special row value assignments ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The optionality of a T3Table row can be controlled setting the row value None. The idea is that None makes a row "invisble": it doesn't contribute anything the value of the T3Table and the row isn't displayed. It also doesn't affect the parsing process :: >>> t = T3Table().add(1, u = 0).add(1, v = 1).add(1, w = 2) >>> t t: u: 00 v: 01 u: 02 Now set a row value to None :: >>> t.v = None >>> t t: u: 00 u: 02 >>> Hex(t) 00 02 >>> t << '03 04 05' <__main__.T3Table object at 0x0377F8D0> u: 03 w: 04 This notwithstanding the row is still present :: >>> t["v"] The rules become somewhat more complex when data binding is involved which is handled in the next section. Data Binding and T3Tables ------------------------- For collection data types such as lists, arrays, dicts or tuples in Python or another high level language like Java, causally dependent data such as the ``size`` of such a collection are not perceived as an integral part of the object representation. Maybe they are but they are *implementation details* of the particular language not part of the language definition. This is very different from lower level languages like C or Pascal where e.g. strings are *composite data*, consisting of a trailing zero to determine the string end or a leading byte which stores the length of a Pascal string. The programmer is responsible for the integrity of the data structure and trades comfort for control. T3Tables are used to represent composite data built in the spirit of Pascal or C. Unlike those the integrity is directly maintained *within* those types using data binding. **Example** :: Array = T3Table() Array.add(1, Length = binding.table("Data", len)) Array.add("*", Data = '00') With this definition we get :: >>> Array Array: $Length: 01 Data: 00 >>> Array.Data = '01 02 03' >>> Array Array: $Length: 03 Data: 01 02 03 The expression ``binding.table("Data", len)`` works as follows: Whenever the row ``Length`` is accessed the value of ``Array.Data`` is passed to ``len`` and the result is assigned to the the value of ``Length`` :: Length.value = len(Array.Data) We could express the relationship shorter by introducing a new operator that Python lacks :: Length.value <- len(Array.Data) This would imply that ``Length.value`` is updated whenever ``Array.Data`` changes. Representation ~~~~~~~~~~~~~~ Rows with a value binding are prefixed with a '$' sigil as shown above. Data binding and cloning ~~~~~~~~~~~~~~~~~~~~~~~~ This behaviour doesn't get lost when Array is cloned :: >>> NewArray = Array(Data = '0F 02') >>> NewArray NewArray: $Length: 02 Data: 0F 02 Data binding and parsing ~~~~~~~~~~~~~~~~~~~~~~~~ When data is parsed into a new T3Table, value bindings will be ignored :: >>> P = Array << '02 00' >>> P P: $Length: '02' Data: '00' This is useful when you want to check the parsed data :: >>> assert P.Length == len(P.Data), "FAIL: Length must be '%s'. '%s' found instead."%(len(P.Data), int(P.Length)) Traceback (most recent call last): File "", line 1, in AssertionError: FAIL: Length must be '1'. '2' found instead. Also cloning won't affect the parsed result :: >>> Hex(P) == Hex(P()) True It is also possible to set the value of a data bound row directly :: >>> P.Length = '03' >>> P P: $Length: '03' Data: '00' Recomputation applies once a data element is updated other than the data bound row :: >>> P.Data += 0 # a change which updates P without changing Data >>> P P: $Length: '01' # the row value is re-computed Data: '00' .. caution :: When you try to update multiple rows in a copy expression such as ``P(Data = '00', Length = '03')`` and one of those rows is value bound, such as ``Length``, a warning will be issued. This is because a dict is passed to ``P`` and the update order remains undetermined. If ``Length`` is updated before ``Data`` the result might differ from ``Data`` being updated before ``Length``. My advice: **never** update value bound rows when you copy a T3Table. Binding to tables ~~~~~~~~~~~~~~~~~ A T3Table can be considered as an own sort of *scope* for data bindings. We have already seen how to bind a function to a row of a table :: Array.add(1, Length = binding.table("Data", len)) On binding assignment the table isn't passed to the binding here but this happens when the value of ``Length`` gets fetched. The ``Length`` row holds a reference to its containing T3Table and this table is assigned to the binding when ``Length.get_value()`` is applied. Passing a row name is not the only possibility to bind to table data. Other options are expressed through **match codes**. The code strings are listed below +--------------------+------------------------------------------+-----------------+ | Code | Result | Notes | +====================+==========================================+=================+ | "name" | Row name of this table | | +--------------------+------------------------------------------+-----------------+ | ``"*"`` | This table | | +--------------------+------------------------------------------+-----------------+ | ``".*"`` | T3Table built from all rows succeeding | | | | this row. | | +--------------------+------------------------------------------+-----------------+ | ``"*."`` | T3Table built from all rows preceding | | | | this row | | +--------------------+------------------------------------------+-----------------+ | ``"*/"`` | Parent table of this table | may be None | +--------------------+------------------------------------------+-----------------+ | ``"*/"`` .. ``"/"``| n-th grandparent table os this table | may be None | +--------------------+------------------------------------------+-----------------+ ..function :: binding.table(match_code, callback) The ``match_code`` is described in the table above. The ``callback`` is a function of a single argument and return value. Dynamically scoped variables ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The data binding mechanism supports bindings to arbitrary objects, not just to the table which contains the binding. This binding type is provided by the function ``binding.dynamic()`` .. function :: binding.dynamic(name [, callback]) ``name`` refers to the name of a variable which gets fetched. The ``callback`` parameter is an optional function of a single variable which returns a single value. The name ``dynamic`` was chosen because ``binding.dynamic(...)`` is a *dynamically scoped* variable. If a binding ``binding.dynamic("X")`` is defined, the binding doesn't refer to an ``X`` at the location of the *definition* of the binding, but an ``X`` at the location of *evaluation* or the call of the binding :: >>> K = 42 >>> B = binding.dynamic("K") >>> def foo(binding): ... K = -42 ... return binding.get_value() ... >>> B.get_value() # binding to K evaluated 42 >>> foo(B) # binding to another K, defined inside foo -42 >>> B.get_value() # binding to the original K 42 Compare this to the lexically scoped binding of variables in the Python interpreter :: >>> K = 42 >>> def static(): ... return K ... >>> def dynamic(): ... return binding.dynamic("K").get_value() ... >>> def foo(binding): ... K = -42 ... return binding() ... >>> static() 42 >>> dynamic() 42 >>> foo(static) 42 >>> foo(dynamic) -42 T3Table subclasses ------------------ The classes introduced in this section are used to refine T3Tables ( T3Bitmap ), provide an evaluation context for assertions about parsing results ( T3TableContext ) or they enhance pattern matching ( T3Set, T3Repeater ). T3Set ~~~~~ A T3Table matches pattern defined in T3Rows in sequential order. If a sequential order ``A B C ...'`` of objects doesn't matter and a permutation ``C A B ...`` of them was equally valid we would need a pattern of the form ``(A | B | C | ...)+`` to apply a successful match. T3Sets are built around the idea of ignoring the sequential order but unlike the rule ``(A | B | C | ...)+`` it accepts only permutations of ``{A, B, C, ...}`` no repitition of any one of its elements: once e.g. ``B`` has been matched the matching process continues with ``{A, C, ...} - {B}``. It is allowed to terminate before all sub-pattern matched. Adding a row to a T3Set ``S`` takes the following form :: S.add(prefix, key = value) For example :: Tlv = T3Table().add(1, Tag = '00').add(1, Len = ...) ... S.add(0x89, T_89 = Tlv ) S.add(0xA6, T_A6 = Tlv ) ... The prefix has a different meaning than it had in a T3Table where a ``0x89`` :ref:`integer pattern` was the length of data to match. In a T3Set it is actually the ``value`` of the ``key = value`` pair which does the match e.g. the ``Tlv`` value. The ``prefix`` acts as a selector of the matching value object. If the ``Tlv`` table was used to match data unspecifically it would match *any* TLV whatsoever. T3Repeater ~~~~~~~~~~ Unlike the other classes mentioned in this section the T3Repeater is not a sublcass of T3Table. It merely holds a T3Table object as a member variable. When the T3Set enhanced pattern matching through the introduction of alternative row matches of the form ``A | B``, the T3Repeater can be perceived as the Kleene star A* or one of its delimited variants. T3Bitmap ~~~~~~~~ T3TableContext ~~~~~~~~~~~~~~