{"id":762,"date":"2009-06-12T10:44:36","date_gmt":"2009-06-12T09:44:36","guid":{"rendered":"http:\/\/fiber-space.de\/wordpress\/?p=762"},"modified":"2009-06-22T21:17:49","modified_gmt":"2009-06-22T20:17:49","slug":"linear-bounds-for-backtracking-parsers-with-memoization","status":"publish","type":"post","link":"http:\/\/fiber-space.de\/wordpress\/2009\/06\/12\/linear-bounds-for-backtracking-parsers-with-memoization\/","title":{"rendered":"Linear bounds for backtracking parsers with memoization"},"content":{"rendered":"<p>On the <a href=\"http:\/\/groups.google.com\/group\/comp.compilers\/browse_frm\/thread\/c56ae885433b5de0\/147f531fc4768255#147f531fc4768255\">comp.compilers<\/a> mailing list I asked for a confirmation of the O(n) complexity claim for <a href=\"http:\/\/pdos.csail.mit.edu\/~baford\/packrat\/\">packrat parsers<\/a>. A packrat  parser is a particular top down recursive descendant parser which applies backtracking on failure of consuming a token by application of a parsing expression. Now a backtracking parser usually exhibits O(2<sup>n<\/sup>) complexity and the essential claim is that <a href=\"http:\/\/en.wikipedia.org\/wiki\/Memoization\">memoization<\/a> brings it down to O(n) on the cost of expanding storage. The work of Bryan Ford about packrat parsers refers to an original article written by <a href=\"http:\/\/www2.computer.org\/portal\/web\/csdl\/doi\/10.1109\/SWAT.1970.18\">Birman &amp; Ullmann<\/a> that gave a proof. The cited article costs 19$ at IEEE which I felt I would spent when I failed to give a proof by myself. Fortunately the proof is elementary and not complicated. So I could save my money.<\/p>\n<p>Here we go.<\/p>\n<p>Each non-terminal `N` of a grammar G has a finite first set `{T1, .., Tn}` which are terminal symbols that can be recursively determined by examining the start of the rule `N`.  So if `N` has the shape<\/p>\n<p>`N \u2192 R1 &#8230; | R2 &#8230;`<\/p>\n<p>then<\/p>\n<p>`FirstSet(N)` = FirstSet(R1) \u22c3 FirstSet(R2) &#8230;<\/p>\n<p>with `FirstSet(`R<sub>i<\/sub> ) = { R<sub>i<\/sub> } if R<sub>i <\/sub>is a terminal symbol.<\/p>\n<p>Next we consider the preimages of terminals with regard to FirstSets. We call them  <em>ancestors<\/em> and for a terminal T the set of ancestors is defined by:<\/p>\n<p>`Ancestors(T) =  {T} \u222a {N \u2208 NonTerminals(G)| T \u2208 FirstSet(N)}`<\/p>\n<p>Each terminal T has at least one ancestor ( itself ) and at most `K = #NT(G)+1` where `#NT(G)` is the number of non-terminals of the grammar G.<\/p>\n<p>Suppose now we feed a token stream T<sub>1<\/sub> T<sub>2<\/sub> &#8230; T<sub>n<\/sub> into a top down parser. For each T<sub>i<\/sub> in the stream we can apply all of the rules in `Ancestors(`T<sub>i<\/sub>`)` and store the results in a set<\/p>\n<p>`Trees(i) = {(i, Tree(R))| R \u2208 Ancestors(`T<sub>i<\/sub>`) }`<\/p>\n<p>Since for all i the inequality`#Ancestors(`T<sub>i<\/sub>`)` \u2264 K  holds the set<\/p>\n<p>`Trees =`<big>\u222a<\/big> <sub>i\u2208{1, &#8230;, n}<\/sub>`Trees(i)`<\/p>\n<p>has at most `K*n` elements.  If a new tree needs to be computed the sub-trees may be looked up in `Trees`. If one is missing that one is computed as well etc. In any case no more than `K*n` computations need to be performed to determine all elements of `Trees`. Since K was a constant that depended only on the grammar we can conclude that the asymptotic complexity of computing `Trees` is O(n). Since `Trees` contains a solution to our problem it can be computed in O(n) as well.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>On the comp.compilers mailing list I asked for a confirmation of the O(n) complexity claim for packrat parsers. A packrat parser is a particular top down recursive descendant parser which applies backtracking on failure of consuming a token by application &hellip; <a href=\"http:\/\/fiber-space.de\/wordpress\/2009\/06\/12\/linear-bounds-for-backtracking-parsers-with-memoization\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[23],"tags":[],"_links":{"self":[{"href":"http:\/\/fiber-space.de\/wordpress\/wp-json\/wp\/v2\/posts\/762"}],"collection":[{"href":"http:\/\/fiber-space.de\/wordpress\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/fiber-space.de\/wordpress\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/fiber-space.de\/wordpress\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/fiber-space.de\/wordpress\/wp-json\/wp\/v2\/comments?post=762"}],"version-history":[{"count":21,"href":"http:\/\/fiber-space.de\/wordpress\/wp-json\/wp\/v2\/posts\/762\/revisions"}],"predecessor-version":[{"id":886,"href":"http:\/\/fiber-space.de\/wordpress\/wp-json\/wp\/v2\/posts\/762\/revisions\/886"}],"wp:attachment":[{"href":"http:\/\/fiber-space.de\/wordpress\/wp-json\/wp\/v2\/media?parent=762"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/fiber-space.de\/wordpress\/wp-json\/wp\/v2\/categories?post=762"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/fiber-space.de\/wordpress\/wp-json\/wp\/v2\/tags?post=762"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}