<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	>

<channel>
	<title>Trails in a Langscape</title>
	<atom:link href="http://fiber-space.de/wordpress/?feed=rss2" rel="self" type="application/rss+xml" />
	<link>http://fiber-space.de/wordpress</link>
	<description>Projects and projections</description>
	<pubDate>Sat, 07 Aug 2010 02:34:45 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.7</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Open source saturation</title>
		<link>http://fiber-space.de/wordpress/?p=1592</link>
		<comments>http://fiber-space.de/wordpress/?p=1592#comments</comments>
		<pubDate>Sat, 07 Aug 2010 02:34:32 +0000</pubDate>
		<dc:creator>kay</dc:creator>
		
		<category><![CDATA[General]]></category>

		<category><![CDATA[Programming Culture]]></category>

		<guid isPermaLink="false">http://fiber-space.de/wordpress/?p=1592</guid>
		<description><![CDATA[Reading the following post of Jimmy Schementi who explained his exit at Microsoft with the loss of MS&#8217;s interest in IronRuby I start to wonder if this isn&#8217;t a sign of the times? Open source projects get started by a small team of employees and killed when they don&#8217;t attract a community which brings them [...]]]></description>
			<content:encoded><![CDATA[<p>Reading the following post of <a href="http://blog.jimmy.schementi.com/2010/08/start-spreading-news-future-of-jimmy.html">Jimmy Schementi</a> who explained his exit at Microsoft with the loss of MS&#8217;s interest in IronRuby I start to wonder if this isn&#8217;t a sign of the times? Open source projects get started by a small team of employees and killed when they don&#8217;t attract a community which brings them forth what rarely ever happens because everyone in OSS is already busy and either engaged with a major project, a brand which has been established a few years ago like (C)Python, Rails, Linux or Django  or doing solo acts as in my own case. Same with <a href="http://googleblog.blogspot.com/2010/08/update-on-google-wave.html">Google Wave</a> which was promising but the only wave it produced was a Tsunami of initial attention in the wikiredditblogosphere. Everyone expected Google would bring it forth just like any other commodity. I guess the same would happen to their Go language which was started by a superstar team of veteran programmers and would immediately go away if Google discontinues investment.</p>

<p>There are very few brands which are both new and do well like Clojure and Scala which seem to follow Pythons BDFL model and they are - unsurprisingly? - programming languages. Are there other examples of OSS projects that peaked in the last 2-3 years and established a community of regular committers who are not interns of a single company or do we see an almost inevitable saturation?</p>
]]></content:encoded>
			<wfw:commentRss>http://fiber-space.de/wordpress/?feed=rss2&amp;p=1592</wfw:commentRss>
		</item>
		<item>
		<title>Langscape</title>
		<link>http://fiber-space.de/wordpress/?p=1571</link>
		<comments>http://fiber-space.de/wordpress/?p=1571#comments</comments>
		<pubDate>Fri, 16 Jul 2010 16:15:47 +0000</pubDate>
		<dc:creator>kay</dc:creator>
		
		<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://fiber-space.de/wordpress/?p=1571</guid>
		<description><![CDATA[Trails in a Langscape

Welcome to Trails in a Langscape which is the new title of this blog. It is a minor change since URLs are not affected and the character of the blog will also remain the same. Langscape is the successor project of EasyExtend and is publically hosted at Google Code.

Since I created this [...]]]></description>
			<content:encoded><![CDATA[<h3>Trails in a Langscape</h3>

<p>Welcome to <strong>Trails in a Langscape</strong> which is the new title of this blog. It is a minor change since URLs are not affected and the character of the blog will also remain the same. <strong>Langscape</strong> is the successor project of EasyExtend and is publically <a href="http://code.google.com/p/langscape/">hosted</a> at Google Code.</p>

<p>Since I created this Wordpress blog instance I slowly worked on a new EasyExtend release. I published lots of related technical ideas but never released any code. Now the code <a href="http://code.google.com/p/langscape/source/checkout">is out</a>. It lives in an Hg repository, it is filed under a new label and hopefully a first packaged Langscape 0.1 release will follow soon. There is no project documentation at the moment and I still think about its organization. Another open issue is packaging and distribution, but I have no idea what is up-to-date in this area, how Langscape is possibly used, if anyone will ever create langlets or just use the growing toolbox applicable to Python, including syntactically guarded search and replace.</p>

<h3>Europython 2010</h3>

<p>Of course the time of publication is not arbitrarily chosen. I attend to <a href="http://www.europython.eu/">Europython 2010</a> next week in Birmingham and have a talk about EasyExtend/Langscape at Wednesday <a href="http://www.europython.eu/talks/timetable/">late in the afternoon</a> before we leave for Conference dinner. I hope many of you go to Europython as well and I&#8217;ll find a few of my casual readers in the audience. If the talk is so good as the fun I had preparing my slides, you&#8217;ll enjoy it as well.</p>
]]></content:encoded>
			<wfw:commentRss>http://fiber-space.de/wordpress/?feed=rss2&amp;p=1571</wfw:commentRss>
		</item>
		<item>
		<title>Ready Mades - African football culture</title>
		<link>http://fiber-space.de/wordpress/?p=1553</link>
		<comments>http://fiber-space.de/wordpress/?p=1553#comments</comments>
		<pubDate>Sat, 12 Jun 2010 10:42:26 +0000</pubDate>
		<dc:creator>kay</dc:creator>
		
		<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://fiber-space.de/wordpress/?p=1553</guid>
		<description><![CDATA[It took just two football games yesterday evening to create a worldwide recognizable sound icon which does now specify &#8220;African football culture&#8221; - a loud, monotonous sounding plastic horn, called the Vuvuzela. Since many spectators, in particular in Europe, complained about the hornets nest kind of noise over the stadium, it is now a symbol [...]]]></description>
			<content:encoded><![CDATA[<p>It took just two football games yesterday evening to create a worldwide recognizable sound icon which does now specify &#8220;African football culture&#8221; - a loud, monotonous sounding plastic horn, called the Vuvuzela. Since many spectators, in particular in Europe, complained about the hornets nest kind of noise over the stadium, it is now a symbol of African difference and tradition even though its mass production goes back for only 4 years.</p>

<p>Overnight the Vuvuzela has become the ready-made of a particular &#8220;culture&#8221;. It doesn&#8217;t matter much if there has been a real tradition i.e. one of singing and drumming because this was just a localized variant of what was already common on other continents. The sound-image of the Vuvuzela makes a difference and this difference can be fed back into a powerful symbolic machinery which retroactively creates identity and difference.</p>

<p>I&#8217;m very curious how everyone is going to deal with it over the next couple of weeks. Nothing against football but I find this far more thrilling than the games themselves.</p>
]]></content:encoded>
			<wfw:commentRss>http://fiber-space.de/wordpress/?feed=rss2&amp;p=1553</wfw:commentRss>
		</item>
		<item>
		<title>Token Tracers</title>
		<link>http://fiber-space.de/wordpress/?p=1537</link>
		<comments>http://fiber-space.de/wordpress/?p=1537#comments</comments>
		<pubDate>Thu, 03 Jun 2010 20:56:23 +0000</pubDate>
		<dc:creator>kay</dc:creator>
		
		<category><![CDATA[EasyExtend]]></category>

		<category><![CDATA[Parsing]]></category>

		<category><![CDATA[TBP]]></category>

		<category><![CDATA[Trail]]></category>

		<guid isPermaLink="false">http://fiber-space.de/wordpress/?p=1537</guid>
		<description><![CDATA[When I started programming EasyExtend in 2006 one of the major problems was the correct grammar -&#62; NFA translation. I used big grammars and testing for correctness required lots of source code. The first heuristics I used was ugly and complex and it took about 2 or so years to find a neat trick which [...]]]></description>
			<content:encoded><![CDATA[<p>When I started programming EasyExtend in 2006 one of the major problems was the correct grammar -&gt; NFA translation. I used big grammars and testing for correctness required lots of source code. The first heuristics I used was ugly and complex and it took about 2 or so years to find a neat trick which finally lead to replace it completely. The basic problem of systematic phrase or expression generation for testing purpose persisted though - until last week when I implemented a <strong>TokenTracer</strong>.</p>

<h3>Tracers</h3>

<p>A typical production rule in the Trail parser generator is translated into a single NFA which might look as in the following example</p>


<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"> <span style="color: #ff4500;">1005</span>: <span style="color: black;">&#91;</span><span style="color: #483d8b;">&quot;funcdef: [decorators] 'def' NAME parameters ':' suite&quot;</span>,
        <span style="color: black;">&#40;</span><span style="color: #ff4500;">1005</span>, <span style="color: #ff4500;">0</span>, <span style="color: #ff4500;">1005</span><span style="color: black;">&#41;</span>,
        <span style="color: black;">&#123;</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">1</span>, <span style="color: #ff4500;">3</span>, <span style="color: #ff4500;">1005</span><span style="color: black;">&#41;</span>: <span style="color: black;">&#91;</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">1006</span>, <span style="color: #ff4500;">4</span>, <span style="color: #ff4500;">1005</span><span style="color: black;">&#41;</span><span style="color: black;">&#93;</span>,
         <span style="color: black;">&#40;</span><span style="color: #ff4500;">11</span>, <span style="color: #ff4500;">5</span>, <span style="color: #ff4500;">1005</span><span style="color: black;">&#41;</span>: <span style="color: black;">&#91;</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">1043</span>, <span style="color: #ff4500;">6</span>, <span style="color: #ff4500;">1005</span><span style="color: black;">&#41;</span><span style="color: black;">&#93;</span>,
         <span style="color: black;">&#40;</span><span style="color: #483d8b;">'def'</span>, <span style="color: #ff4500;">2</span>, <span style="color: #ff4500;">1005</span><span style="color: black;">&#41;</span>: <span style="color: black;">&#91;</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">1</span>, <span style="color: #ff4500;">3</span>, <span style="color: #ff4500;">1005</span><span style="color: black;">&#41;</span><span style="color: black;">&#93;</span>,
         <span style="color: black;">&#40;</span><span style="color: #ff4500;">1004</span>, <span style="color: #ff4500;">1</span>, <span style="color: #ff4500;">1005</span><span style="color: black;">&#41;</span>: <span style="color: black;">&#91;</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'def'</span>, <span style="color: #ff4500;">2</span>, <span style="color: #ff4500;">1005</span><span style="color: black;">&#41;</span><span style="color: black;">&#93;</span>,
         <span style="color: black;">&#40;</span><span style="color: #ff4500;">1005</span>, <span style="color: #ff4500;">0</span>, <span style="color: #ff4500;">1005</span><span style="color: black;">&#41;</span>: <span style="color: black;">&#91;</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'def'</span>, <span style="color: #ff4500;">2</span>, <span style="color: #ff4500;">1005</span><span style="color: black;">&#41;</span>, <span style="color: black;">&#40;</span><span style="color: #ff4500;">1004</span>, <span style="color: #ff4500;">1</span>, <span style="color: #ff4500;">1005</span><span style="color: black;">&#41;</span><span style="color: black;">&#93;</span>,
         <span style="color: black;">&#40;</span><span style="color: #ff4500;">1006</span>, <span style="color: #ff4500;">4</span>, <span style="color: #ff4500;">1005</span><span style="color: black;">&#41;</span>: <span style="color: black;">&#91;</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">11</span>, <span style="color: #ff4500;">5</span>, <span style="color: #ff4500;">1005</span><span style="color: black;">&#41;</span><span style="color: black;">&#93;</span>,
         <span style="color: black;">&#40;</span><span style="color: #ff4500;">1043</span>, <span style="color: #ff4500;">6</span>, <span style="color: #ff4500;">1005</span><span style="color: black;">&#41;</span>: <span style="color: black;">&#91;</span><span style="color: black;">&#40;</span><span style="color: #008000;">None</span>, <span style="color: #483d8b;">'-'</span>, <span style="color: #ff4500;">1005</span><span style="color: black;">&#41;</span><span style="color: black;">&#93;</span><span style="color: black;">&#125;</span><span style="color: black;">&#93;</span>,</pre></div></div>


<p>It is not created for readability but it is nevertheless easy to decode. The <span style="font-family: Courier New,Courier,monospace;">funcdef</span> grammar rule is assigned a numerical value, a rule identifier - here <span style="font-family: Courier New,Courier,monospace;">1005</span>. Asscociated with the rule identifier is a 3-list consisting of</p>

<ol>
    <li>The rule in plain text</li>
    <li>The start state of a finite automaton (1005, 0, 1005)</li>
    <li>A finite automaton encoded as a dictionary of transitions.</li>
</ol>

<p>Starting with <span style="font-family: Courier New,Courier,monospace;">(1005, 0, 1005)</span> one can step through the automaton. The <em>follow states</em> are  <span style="font-family: Courier New,Courier,monospace;">[('def', 2, 1005), (1004, 1, 1005)]</span>. The first one obviously represents the <span style="font-family: Courier New,Courier,monospace;">def</span> keyword whereas the second is a representation of the <span style="font-family: Courier New,Courier,monospace;">decorators</span> non-terminal which has the rule identifier <span style="font-family: Courier New,Courier,monospace;">1004</span>. When you select the <span style="font-family: Courier New,Courier,monospace;">(1004, 1, 1005)</span> state there is a single follow state, which is again the state of the <span style="font-family: Courier New,Courier,monospace;">def</span> keyword otherwise you get the follow state <span style="font-family: Courier New,Courier,monospace;">(1, 3, 2005)</span> of  <span style="font-family: Courier New,Courier,monospace;">(&#8217;def&#8217;, 2, 1005)</span>. The state <span style="font-family: Courier New,Courier,monospace;">(None, &#8216;-&#8217;, 1005)</span> doesn&#8217;t have a follow state and it is the only one.</p>

<p>You can now define a function that keeps track of this stepping process through a rule. This function is called a <em>Tracer</em>.</p>

<p>A Tracer acts as follows:</p>


<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #306f30;">&gt;&gt;&gt;</span> tracer = Tracer<span style="color: black;">&#40;</span>rules<span style="color: black;">&#41;</span>
<span style="color: #306f30;">&gt;&gt;&gt;</span> tracer.<span style="">select</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">1005</span><span style="color: black;">&#41;</span>   <span style="color: #808080; font-style: italic;"># selects automaton 1005 and returns the rule ids of the</span>
<span style="color: black;">&#91;</span><span style="color: #483d8b;">'def'</span>, <span style="color: #ff4500;">1004</span><span style="color: black;">&#93;</span>             <span style="color: #808080; font-style: italic;"># possible follow states</span>
<span style="color: #306f30;">&gt;&gt;&gt;</span> tracer.<span style="">select</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'def'</span><span style="color: black;">&#41;</span>
<span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span>
<span style="color: #306f30;">&gt;&gt;&gt;</span> tracer.<span style="">select</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#41;</span>
<span style="color: black;">&#91;</span><span style="color: #ff4500;">1006</span><span style="color: black;">&#93;</span>
...</pre></div></div>


<p>It is possible that a Tracer has to keep track of multiple traces at once. For example the <span style="font-family: Courier New,Courier,monospace;">exprlist</span>rule</p>


<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"> <span style="color: #ff4500;">1069</span>: <span style="color: black;">&#91;</span><span style="color: #483d8b;">&quot;exprlist: expr (',' expr)* [',']&quot;</span>,
        <span style="color: black;">&#40;</span><span style="color: #ff4500;">1069</span>, <span style="color: #ff4500;">0</span>, <span style="color: #ff4500;">1069</span><span style="color: black;">&#41;</span>,
        <span style="color: black;">&#123;</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">12</span>, <span style="color: #ff4500;">2</span>, <span style="color: #ff4500;">1069</span><span style="color: black;">&#41;</span>: <span style="color: black;">&#91;</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">1053</span>, <span style="color: #ff4500;">3</span>, <span style="color: #ff4500;">1069</span><span style="color: black;">&#41;</span><span style="color: black;">&#93;</span>,
         <span style="color: black;">&#40;</span><span style="color: #ff4500;">12</span>, <span style="color: #ff4500;">4</span>, <span style="color: #ff4500;">1069</span><span style="color: black;">&#41;</span>: <span style="color: black;">&#91;</span><span style="color: black;">&#40;</span><span style="color: #008000;">None</span>, <span style="color: #483d8b;">'-'</span>, <span style="color: #ff4500;">1069</span><span style="color: black;">&#41;</span><span style="color: black;">&#93;</span>,
         <span style="color: black;">&#40;</span><span style="color: #ff4500;">1053</span>, <span style="color: #ff4500;">1</span>, <span style="color: #ff4500;">1069</span><span style="color: black;">&#41;</span>: <span style="color: black;">&#91;</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">12</span>, <span style="color: #ff4500;">4</span>, <span style="color: #ff4500;">1069</span><span style="color: black;">&#41;</span>, <span style="color: black;">&#40;</span><span style="color: #ff4500;">12</span>, <span style="color: #ff4500;">2</span>, <span style="color: #ff4500;">1069</span><span style="color: black;">&#41;</span>, <span style="color: black;">&#40;</span><span style="color: #008000;">None</span>, <span style="color: #483d8b;">'-'</span>, <span style="color: #ff4500;">1069</span><span style="color: black;">&#41;</span><span style="color: black;">&#93;</span>,
         <span style="color: black;">&#40;</span><span style="color: #ff4500;">1053</span>, <span style="color: #ff4500;">3</span>, <span style="color: #ff4500;">1069</span><span style="color: black;">&#41;</span>: <span style="color: black;">&#91;</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">12</span>, <span style="color: #ff4500;">4</span>, <span style="color: #ff4500;">1069</span><span style="color: black;">&#41;</span>, <span style="color: black;">&#40;</span><span style="color: #ff4500;">12</span>, <span style="color: #ff4500;">2</span>, <span style="color: #ff4500;">1069</span><span style="color: black;">&#41;</span>, <span style="color: black;">&#40;</span><span style="color: #008000;">None</span>, <span style="color: #483d8b;">'-'</span>, <span style="color: #ff4500;">1069</span><span style="color: black;">&#41;</span><span style="color: black;">&#93;</span>,
         <span style="color: black;">&#40;</span><span style="color: #ff4500;">1069</span>, <span style="color: #ff4500;">0</span>, <span style="color: #ff4500;">1069</span><span style="color: black;">&#41;</span>: <span style="color: black;">&#91;</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">1053</span>, <span style="color: #ff4500;">1</span>, <span style="color: #ff4500;">1069</span><span style="color: black;">&#41;</span><span style="color: black;">&#93;</span><span style="color: black;">&#125;</span><span style="color: black;">&#93;</span>,</pre></div></div>


<p>defines transitions of the kind</p>


<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: black;">&#40;</span><span style="color: #ff4500;">1053</span>, <span style="color: #ff4500;">1</span>, <span style="color: #ff4500;">1069</span><span style="color: black;">&#41;</span>: <span style="color: black;">&#91;</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">12</span>, <span style="color: #ff4500;">4</span>, <span style="color: #ff4500;">1069</span><span style="color: black;">&#41;</span>, <span style="color: black;">&#40;</span><span style="color: #ff4500;">12</span>, <span style="color: #ff4500;">2</span>, <span style="color: #ff4500;">1069</span><span style="color: black;">&#41;</span>, <span style="color: black;">&#40;</span><span style="color: #008000;">None</span>, <span style="color: #483d8b;">'-'</span>, <span style="color: #ff4500;">1069</span><span style="color: black;">&#41;</span><span style="color: black;">&#93;</span></pre></div></div>


<p>with two rules of rule id <span style="font-family: Courier New,Courier,monospace;">12</span> in the follow set. When <span style="font-family: Courier New,Courier,monospace;">12</span> is selected in the Tracer all follow sets of all rules with rule id = 12 are unified:</p>


<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #306f30;">&gt;&gt;&gt;</span> tracer.<span style="">select</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">1069</span><span style="color: black;">&#41;</span>
<span style="color: black;">&#91;</span><span style="color: #ff4500;">1053</span><span style="color: black;">&#93;</span>
<span style="color: #306f30;">&gt;&gt;&gt;</span> tracer.<span style="">select</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">1053</span><span style="color: black;">&#41;</span>
<span style="color: black;">&#91;</span><span style="color: #ff4500;">12</span>, <span style="color: #008000;">None</span><span style="color: black;">&#93;</span>
<span style="color: #306f30;">&gt;&gt;&gt;</span> tracer.<span style="">select</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">12</span><span style="color: black;">&#41;</span>
<span style="color: black;">&#91;</span><span style="color: #ff4500;">1053</span>, <span style="color: #008000;">None</span><span style="color: black;">&#93;</span>
...</pre></div></div>


<h3>TokenTracers</h3>

<p>This kind of tracing functionality is central to EasyExtends implementation of <em>Trace Based Parsing </em>(TBP). For single grammar rules TBP coincides with &#8220;Thompson NFA&#8221; style parsing discussed at length by Russ Cox or more recently by Carl Friedrich Bolz who gave a Python implementation.</p>

<p>We want to consider now a different sort of tracer which is more complicated to create than those for single grammar rules. Those tracers have to meet the following requirement:</p>

<p><strong>The list of rule id&#8217;s returned from tracer.select() shall contain only None or rule id&#8217;s of terminal symbols.</strong></p>

<p>The rule id&#8217;s of terminals are exactly the  token types. The <span style="font-family: Courier New,Courier,monospace;">select</span> function of a <em>TokenTracer</em> returns a list of token types and gets fed with a single token type. In the following example we step through the token stream of a simple function</p>


<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #000066;font-weight:bold;">def</span> foo<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:
    <span style="color: #000066;font-weight:bold;">print</span> <span style="color: #ff4500;">42</span></pre></div></div>


<p>Here we go</p>


<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #306f30;">&gt;&gt;&gt;</span> tracer = TokenTracer<span style="color: black;">&#40;</span>rules<span style="color: black;">&#41;</span>
<span style="color: #306f30;">&gt;&gt;&gt;</span> tracer.<span style="">select</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">1001</span><span style="color: black;">&#41;</span>  <span style="color: #808080; font-style: italic;"># a single select using a top level non-terminal</span>
<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span>, <span style="color: #ff4500;">1</span>, <span style="color: #ff4500;">2</span>, <span style="color: #ff4500;">3</span>, <span style="color: #ff4500;">4</span>, <span style="color: #ff4500;">7</span>, ... , <span style="color: #483d8b;">'assert'</span>, <span style="color: #483d8b;">'break'</span>, <span style="color: #483d8b;">'class'</span>, <span style="color: #483d8b;">'continue'</span>, <span style="color: #483d8b;">'def'</span>, ...<span style="color: black;">&#93;</span>
<span style="color: #306f30;">&gt;&gt;&gt;</span> tracer.<span style="">select</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'def'</span><span style="color: black;">&#41;</span>
<span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span>
<span style="color: #306f30;">&gt;&gt;&gt;</span> tracer.<span style="">select</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#41;</span>     <span style="color: #808080; font-style: italic;"># foo</span>
<span style="color: black;">&#91;</span><span style="color: #ff4500;">7</span><span style="color: black;">&#93;</span>
<span style="color: #306f30;">&gt;&gt;&gt;</span> tracer.<span style="">select</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">7</span><span style="color: black;">&#41;</span>     <span style="color: #808080; font-style: italic;"># (</span>
<span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span>, <span style="color: #ff4500;">7</span>, <span style="color: #ff4500;">8</span>, <span style="color: #ff4500;">16</span>, <span style="color: #ff4500;">36</span><span style="color: black;">&#93;</span>
<span style="color: #306f30;">&gt;&gt;&gt;</span> tracer.<span style="">select</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">8</span><span style="color: black;">&#41;</span>     <span style="color: #808080; font-style: italic;"># )</span>
<span style="color: black;">&#91;</span><span style="color: #ff4500;">11</span><span style="color: black;">&#93;</span>
<span style="color: #306f30;">&gt;&gt;&gt;</span> tracer.<span style="">select</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">11</span><span style="color: black;">&#41;</span>    <span style="color: #808080; font-style: italic;"># :</span>
<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span>, <span style="color: #ff4500;">1</span>, <span style="color: #ff4500;">2</span>, <span style="color: #ff4500;">3</span>, <span style="color: #ff4500;">4</span>, <span style="color: #ff4500;">7</span>, ... , <span style="color: #483d8b;">'assert'</span>, <span style="color: #483d8b;">'break'</span>, <span style="color: #483d8b;">'class'</span>, <span style="color: #483d8b;">'continue'</span>, <span style="color: #483d8b;">'def'</span>, ...<span style="color: black;">&#93;</span>
<span style="color: #306f30;">&gt;&gt;&gt;</span> tracer.<span style="">select</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">4</span><span style="color: black;">&#41;</span>     <span style="color: #808080; font-style: italic;"># \n</span>
<span style="color: black;">&#91;</span><span style="color: #ff4500;">5</span><span style="color: black;">&#93;</span>
<span style="color: #306f30;">&gt;&gt;&gt;</span> tracer.<span style="">select</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">5</span><span style="color: black;">&#41;</span>     <span style="color: #808080; font-style: italic;"># INDENT</span>
<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span>, <span style="color: #ff4500;">1</span>, <span style="color: #ff4500;">2</span>, <span style="color: #ff4500;">3</span>, <span style="color: #ff4500;">4</span>, <span style="color: #ff4500;">7</span>, ... , <span style="color: #483d8b;">'assert'</span>, <span style="color: #483d8b;">'break'</span>, <span style="color: #483d8b;">'class'</span>, <span style="color: #483d8b;">'continue'</span>, <span style="color: #483d8b;">'def'</span>, ...<span style="color: black;">&#93;</span>
<span style="color: #306f30;">&gt;&gt;&gt;</span> tracer.<span style="">select</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'print'</span><span style="color: black;">&#41;</span>
<span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span>, <span style="color: #ff4500;">2</span>, <span style="color: #ff4500;">3</span>, <span style="color: #ff4500;">4</span>, <span style="color: #ff4500;">7</span>, <span style="color: #ff4500;">9</span>, <span style="color: #ff4500;">13</span>, <span style="color: #ff4500;">13</span>, <span style="color: #ff4500;">14</span>, <span style="color: #ff4500;">15</span>, <span style="color: #ff4500;">25</span>, <span style="color: #ff4500;">26</span>, <span style="color: #ff4500;">32</span>, <span style="color: #ff4500;">35</span>, <span style="color: #483d8b;">'lambda'</span>, <span style="color: #483d8b;">'not'</span><span style="color: black;">&#93;</span>
<span style="color: #306f30;">&gt;&gt;&gt;</span> tracer.<span style="">select</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">2</span><span style="color: black;">&#41;</span>     <span style="color: #808080; font-style: italic;"># 42</span>
<span style="color: black;">&#91;</span><span style="color: #ff4500;">4</span>, <span style="color: #ff4500;">7</span>, <span style="color: #ff4500;">9</span>, <span style="color: #ff4500;">12</span>, ..., <span style="color: #ff4500;">36</span>, <span style="color: #ff4500;">48</span>, <span style="color: #483d8b;">'&lt;&gt;'</span>, <span style="color: #483d8b;">'and'</span>, <span style="color: #483d8b;">'if'</span>, <span style="color: #483d8b;">'in'</span>, <span style="color: #483d8b;">'is'</span>, <span style="color: #483d8b;">'is'</span>, <span style="color: #483d8b;">'not'</span>, <span style="color: #483d8b;">'or'</span><span style="color: black;">&#93;</span>
<span style="color: #306f30;">&gt;&gt;&gt;</span> tracer.<span style="">select</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">4</span><span style="color: black;">&#41;</span>     <span style="color: #808080; font-style: italic;"># \n</span>
<span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span>, <span style="color: #ff4500;">2</span>, <span style="color: #ff4500;">3</span>, <span style="color: #ff4500;">6</span>, <span style="color: #ff4500;">7</span>, ... , <span style="color: #483d8b;">'try'</span>, <span style="color: #483d8b;">'while'</span>, <span style="color: #483d8b;">'with'</span>, <span style="color: #483d8b;">'yield'</span><span style="color: black;">&#93;</span>
<span style="color: #306f30;">&gt;&gt;&gt;</span> tracer.<span style="">select</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">6</span><span style="color: black;">&#41;</span>     <span style="color: #808080; font-style: italic;"># DEDENT</span>
<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span>, <span style="color: #ff4500;">1</span>, <span style="color: #ff4500;">2</span>, <span style="color: #ff4500;">3</span>, <span style="color: #ff4500;">4</span>, <span style="color: #ff4500;">7</span>, ... , <span style="color: #483d8b;">'assert'</span>, <span style="color: #483d8b;">'break'</span>, <span style="color: #483d8b;">'class'</span>, <span style="color: #483d8b;">'continue'</span>, <span style="color: #483d8b;">'def'</span>, ...<span style="color: black;">&#93;</span>
<span style="color: #306f30;">&gt;&gt;&gt;</span> tracer.<span style="">select</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#41;</span>     <span style="color: #808080; font-style: italic;"># ENDMARKER</span></pre></div></div>


<h3>Application 1 - error detection</h3>

<p>Using a TokenTracer it is dead simple to localize a syntax error which is - in the context free case - always an unexpected token. In principle Trail could delegate error recovery entirely to a TokenTracer.</p>

<h3>Application 2 - autocorrection</h3>

<p>A <em>constant token </em>is a token with a constant token string e.g. &#8216;;&#8217; or &#8216;:&#8217;. Closely related are token like INDENT where the token string can be derived from context and a prescribed indentation. In sharp contrast are token like NAME, NUMBER and STRING where the token string is not language but user determined. In the select() sequence above we find constant token lists of length = 1 like [11] or [7]. If one of those token is omitted it can be inserted without guessing.</p>

<h3>Application 3 - expression generation</h3>

<p>The most intriguing aspect of TokenTracers is that each random token sequence which is constrained by a TokenTracer is syntactically correct. This can be used to create expression generators: first write a grammar G to describe the language syntax, then you derive a TokenTracer(G). Finally an expression generator <span style="font-family: Courier New,Courier,monospace;">ExprGen(TokenTracer(G))</span> is created which is used to build random token sequences being compliant with G by means of the TokenTracer. Those token-sequences can either be turned into valid parse trees and get compiled or un-tokenized into source code.</p>

<p>A valuation function <span style="font-family: Courier New,Courier,monospace;">fitness(expr)</span> -&gt; <span style="font-family: Courier New,Courier,monospace;">float</span> on expressions motivates the use of genetic programming for breeding expressions of a certain kind. For example I&#8217;m strongly  interested in compact grammars which create big NFA expansions in Trail. It is not easy to see how those can be built by hand. Using GP one could set an arbitrary threshold like n = 1000 for the number of states in a single expanded NFA and tries to minimize the size of a grammar, where the size is measured in the number of tokens used for a grammar description in some meta-grammar ( e.g. EBNF ).</p>
]]></content:encoded>
			<wfw:commentRss>http://fiber-space.de/wordpress/?feed=rss2&amp;p=1537</wfw:commentRss>
		</item>
		<item>
		<title>The tangled hierarchy of nonsense - computing science vision</title>
		<link>http://fiber-space.de/wordpress/?p=1524</link>
		<comments>http://fiber-space.de/wordpress/?p=1524#comments</comments>
		<pubDate>Mon, 26 Apr 2010 16:33:54 +0000</pubDate>
		<dc:creator>kay</dc:creator>
		
		<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://fiber-space.de/wordpress/?p=1524</guid>
		<description><![CDATA[
    English - meaningless, because it doesn&#8217;t have a formal semantics
    Haskell - somewhat meaningless but less than English
    ML - meaningful because it has a formal semantics, which means nothing though without a human reader who thinks in meaningless terms of the English language

]]></description>
			<content:encoded><![CDATA[<ul>
    <li>English - meaningless, because it doesn&#8217;t have a formal semantics</li>
    <li>Haskell - somewhat meaningless but less than English</li>
    <li>ML - meaningful because it has a formal semantics, which means nothing though without a human reader who thinks in meaningless terms of the English language</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://fiber-space.de/wordpress/?feed=rss2&amp;p=1524</wfw:commentRss>
		</item>
		<item>
		<title>Shaky Python future</title>
		<link>http://fiber-space.de/wordpress/?p=1513</link>
		<comments>http://fiber-space.de/wordpress/?p=1513#comments</comments>
		<pubDate>Sat, 24 Apr 2010 10:47:09 +0000</pubDate>
		<dc:creator>kay</dc:creator>
		
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://fiber-space.de/wordpress/?p=1513</guid>
		<description><![CDATA[Mark Pilgrim says:

Anyway, I&#8217;m really proud of how well DiP3 [Dive into Python 3, ks] came out. The only problem is that no one is using Python 3. I took a gamble last year that large libraries would port to Python 3 while I was writing. That didn&#8217;t happen. I think it&#8217;s pretty clear by [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.reddit.com/r/programming/comments/bv9gt/dive_into_python_must_die/c0oqask">Mark Pilgrim</a> says:</p>

<blockquote>Anyway, I&#8217;m really proud of how well DiP3 [Dive into Python 3, ks] came out. The only problem is that no one is using Python 3. I took a gamble last year that large libraries would port to Python 3 while I was writing. That didn&#8217;t happen. I think it&#8217;s pretty clear by now that that&#8217;s not going to happen anytime soon. Everyone who gambled on the glorious non-backward-compatible future got burned. Given my experience with HTML, you&#8217;d think I&#8217;d learn. Ah well.</blockquote>

<p>So what are realist expectations? Python 2 as the future of a research language called Python 3?</p>
]]></content:encoded>
			<wfw:commentRss>http://fiber-space.de/wordpress/?feed=rss2&amp;p=1513</wfw:commentRss>
		</item>
		<item>
		<title>Rebuilding a ship on the ocean</title>
		<link>http://fiber-space.de/wordpress/?p=1505</link>
		<comments>http://fiber-space.de/wordpress/?p=1505#comments</comments>
		<pubDate>Sat, 03 Apr 2010 17:56:55 +0000</pubDate>
		<dc:creator>kay</dc:creator>
		
		<category><![CDATA[Programming Culture]]></category>

		<guid isPermaLink="false">http://fiber-space.de/wordpress/?p=1505</guid>
		<description><![CDATA[The contemporary hero? The one who changes a running system without causing havoc.

Usually you start with the energy of a reformer but end up as a connoisseur of delicate causalities and dependencies. Jaron Lanier writes about such &#8220;lock ins&#8221; in his latest book, which I will review here later. Reinventing the wheel might be a [...]]]></description>
			<content:encoded><![CDATA[<p>The contemporary hero? The one who changes a running system without causing havoc.</p>

<p>Usually you start with the energy of a reformer but end up as a connoisseur of delicate causalities and dependencies. Jaron Lanier writes about such &#8220;lock ins&#8221; in his latest book, which I will review here later. Reinventing the wheel might be a simple strategy to detract from dependencies. They have not had time yet to spread and strangle innovation.  But if the (re-)invention becomes successful, it gets involved in the tangle, spreads its own dependencies and adds to the overall complexity. We observe an excess of relatedness with lots of unknowns.</p>

<p>Far from being &#8220;principled conservatives&#8221; we use to be frustrated reformers for whom the modernist call for &#8220;embracing change&#8221; becomes a moral/emotional supplement i.e. the  right attitude without much of a noticeable effect.</p>
]]></content:encoded>
			<wfw:commentRss>http://fiber-space.de/wordpress/?feed=rss2&amp;p=1505</wfw:commentRss>
		</item>
		<item>
		<title>Inheritance and the C preprocessor</title>
		<link>http://fiber-space.de/wordpress/?p=1449</link>
		<comments>http://fiber-space.de/wordpress/?p=1449#comments</comments>
		<pubDate>Wed, 24 Mar 2010 02:30:34 +0000</pubDate>
		<dc:creator>kay</dc:creator>
		
		<category><![CDATA[Algorithms]]></category>

		<category><![CDATA[C]]></category>

		<guid isPermaLink="false">http://fiber-space.de/wordpress/?p=1449</guid>
		<description><![CDATA[Defining n-ary trees using the C preprocessor

In this article I introduce a compile time C technique used to define inheritance. Instead of giving a lengthy motivation I&#8217;ll jump directly to the algorithm and discuss it later. I hope lovers of C and its preprocessor find it useful. #defines first!


#define TOP 0
#define SET_CHILD(n,parent) ( parent==TOP ? [...]]]></description>
			<content:encoded><![CDATA[<h3>Defining n-ary trees using the C preprocessor</h3>

<p>In this article I introduce a compile time C technique used to define inheritance. Instead of giving a lengthy motivation I&#8217;ll jump directly to the algorithm and discuss it later. I hope lovers of C and its preprocessor find it useful. #defines first!</p>


<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">#define TOP 0
#define SET_CHILD(n,parent) ( parent==TOP ? n: \
                            ( parent&lt;(1&lt;&lt;4) ? (n&lt;&lt;4) + parent : \
                            ( parent&lt;(1&lt;&lt;8) ? (n&lt;&lt;8) + parent : (n&lt;&lt;12)+parent)))
&nbsp;
#define IS_SUBNODE(child, parent) ((child &amp; parent) == parent)
&nbsp;
#define SELECT(X, a, best) ( a &gt; best &amp;&amp; IS_SUBNODE(X, a)? a : best)
&nbsp;
#define SELECT_FROM_5(X, a, b, c, d, e) SELECT(X, a, \
                                        SELECT(X, b, \
                                        SELECT(X, c, \
                                        SELECT(X, d, \
                                        SELECT(X, e, 0)))))
&nbsp;
#define SELECT_FROM_4(X, a, b, c, d) SELECT_FROM_5(X, a, b, c, d, 0)
#define SELECT_FROM_3(X, a, b, c)    SELECT_FROM_5(X, a, b, c, 0, 0)
#define SELECT_FROM_2(X, a, b)       SELECT_FROM_5(X, a, b, 0, 0, 0)</pre></div></div>


<p>The <span style="font-family: Courier New,Courier,monospace;">SET_CHILD</span> macro is used to define up to 15 child nodes of a given root for a n-ary tree of depth 5 with a single root node, named <span style="font-family: Courier New,Courier,monospace;">TOP</span>. This is encoded within a single number of type <span style="font-family: Courier New,Courier,monospace;">word</span> which is adequate for most embedded compilers. For 32 or 64 bit processors one can either support more child nodes or a deeper tree.</p>

<p><span style="font-family: Courier New,Courier,monospace;">SET_CHILD</span> is assigning a name to n-th child of a given <span style="font-family: Courier New,Courier,monospace;">parent</span>. One starts with <span style="font-family: Courier New,Courier,monospace;">TOP</span> as the parent of all nodes and recurses down:</p>


<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">#define A SET_CHILD(1, TOP)
#define B SET_CHILD(2, TOP)
...
#define A1 SET_CHILD(1, A)
#define A2 SET_CHILD(2, A)
...
#define B1 SET_CHILD(1, B)
#define B2 SET_CHILD(2, B)
...
#define A11 SET_CHILD(1, A1)
#define A12 SET_CHILD(2, A1)
...
#define A21 SET_CHILD(1, A2)
#define A22 SET_CHILD(2, A2)
...</pre></div></div>


<p>By construction no more than 15 child nodes for a given parent are permitted. If more are used, macros like <span style="font-family: Courier New,Courier,monospace;">IS_CHILD</span> will fail to work correctly.</p>

<p>Once a tree is created with the appropriate nodes, one can use <span style="font-family: Courier New,Courier,monospace;">IS_CHILD</span> to check for child/parent relationships. The tree is constructed s.t. <span style="font-family: Courier New,Courier,monospace;">IS_CHILD(A, B)</span> returns 1 iff <span style="font-family: Courier New,Courier,monospace;">A</span> is a direct child of <span style="font-family: Courier New,Courier,monospace;">B</span> or a grandchild of <span style="font-family: Courier New,Courier,monospace;">B</span> etc. otherwise 0. So <span style="font-family: Courier New,Courier,monospace;">IS_CHILD(A22, A)</span> evaluates to 1 just like <span style="font-family: Courier New,Courier,monospace;">IS_CHILD(A22, A2)</span> or <span style="font-family: Courier New,Courier,monospace;">IS_CHILD(A22, TOP)</span> but <span style="font-family: Courier New,Courier,monospace;">IS_CHILD(A22, A1)</span> is 0.</p>

<p>The C preprocessor doesn&#8217;t support overloading and the flavors I checked didn&#8217;t support varargs wich wouldn&#8217;t probably be much helpful in this case either. So I defined a group of 5 <span style="font-family: Courier New,Courier,monospace;">SELECT_FROM_xx</span> macros being distinguished only be the number of arguments. The number 5 isn&#8217;t magic and one can extend the range of <span style="font-family: Courier New,Courier,monospace;">SELECT_FROM_xx</span> macros by need.</p>

<p>How is <span style="font-family: Courier New,Courier,monospace;">SELECT_FROM_xx</span> used? The first argument <span style="font-family: Courier New,Courier,monospace;">X</span> is an arbitary node of the tree. If one of the susequent nodes <span style="font-family: Courier New,Courier,monospace;">a</span>, <span style="font-family: Courier New,Courier,monospace;">b</span>, &#8230; <span style="font-family: Courier New,Courier,monospace;">c</span> is identical with <span style="font-family: Courier New,Courier,monospace;">X</span>, <span style="font-family: Courier New,Courier,monospace;">X</span> will be the value of <span style="font-family: Courier New,Courier,monospace;">SELECT_FROM_xx(X, a, b, &#8230;, c)</span>. Otherwise the <em>most-direct-parent</em> of <span style="font-family: Courier New,Courier,monospace;">X</span> among the nodes <span style="font-family: Courier New,Courier,monospace;">a</span>, &#8230;<span style="font-family: Courier New,Courier,monospace;">c</span> will be returned. If none of them is a parent of <span style="font-family: Courier New,Courier,monospace;">X</span> the return value is <span style="font-family: Courier New,Courier,monospace;">TOP</span>.</p>

<p><strong>Example:</strong></p>

<p>If we set</p>


<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">#define X A22</pre></div></div>


<p>then we get</p>


<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">SELECT_FROM_2(X, A, B)        // = A
SELECT_FROM_3(X, A, B, A1)    // = A
SELECT_FROM_3(X, A, B, A2)    // = A2
SELECT_FROM_3(X, A2, B, A)    // = A2
SELECT_FROM_3(X, A2, A, A22)  // = A2
SELECT_FROM_2(X, A1, B)       // = TOP</pre></div></div>


<h3>Inheritance</h3>

<p>With the definitions above we can influence conditional compilation:</p>


<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">#if SELECT_FROM_3(X,A2,A,B) == A2
        const int a = 0;
#else if SELECT_FROM_3(X,A2,A,B) == A
        const int a = 1;
#else if SELECT_FROM_3(X,A2,A,B) == B
        const int a = 2;
#else
        const int a = -1;
#endif</pre></div></div>


<p>The virtue of the construction lies in its robustness. Suppose X is <span style="font-family: Courier New,Courier,monospace;">A22</span> then the first branch is selected but this remains true also if we build a &#8220;subclass&#8221; <span style="font-family: Courier New,Courier,monospace;">A22k , k = 1, &#8230;, 9, A, &#8230;, F</span> of <span style="font-family: Courier New,Courier,monospace;">A22</span> and assign e.g.</p>


<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">#define X A225</pre></div></div>


<p>So if we use conditional compilation for a given System <span style="font-family: Courier New,Courier,monospace;">S</span> and create a subsystem <span style="font-family: Courier New,Courier,monospace;">T</span> of <span style="font-family: Courier New,Courier,monospace;">S</span>  e.g. a new version of <span style="font-family: Courier New,Courier,monospace;">S</span>, we have to adapt our C code only in places where <span style="font-family: Courier New,Courier,monospace;">T</span> differs explicitely from <span style="font-family: Courier New,Courier,monospace;">S</span>. This robustness is also the major virtue of using inheritance / polymorphism in OOP. It has led to disrespect of using case-statements in OOP since those do not exploit polymorphism and cause in turn less robust code. We see that case- or if-else statements can be confined with the very same idea and robustness even on the level of the C preprocessor. The additional charme of using the C preprocessor is that child/parent relationships are computed at compile time and do not cause any runtime performance penalty.</p>
]]></content:encoded>
			<wfw:commentRss>http://fiber-space.de/wordpress/?feed=rss2&amp;p=1449</wfw:commentRss>
		</item>
		<item>
		<title>Restricted backmatching</title>
		<link>http://fiber-space.de/wordpress/?p=1441</link>
		<comments>http://fiber-space.de/wordpress/?p=1441#comments</comments>
		<pubDate>Sat, 13 Mar 2010 08:44:37 +0000</pubDate>
		<dc:creator>kay</dc:creator>
		
		<category><![CDATA[TBP]]></category>

		<guid isPermaLink="false">http://fiber-space.de/wordpress/?p=1441</guid>
		<description><![CDATA[In practice we often encounter situations when our preferred approach to problem solving breaks down. Just look at the recent Google implementation of  a regexp engine RE2, created by Russ Cox, the guy who has written a revival paper for Thompson NFAs  a while ago with a few follow-ups which build on those ideas.  Now [...]]]></description>
			<content:encoded><![CDATA[<p>In practice we often encounter situations when our preferred approach to problem solving breaks down. Just look at the recent Google implementation of  a regexp engine<a href="http://code.google.com/p/re2/"> RE2</a>, created by Russ Cox, the guy who has written a <a href="http://swtch.com/~rsc/regexp/regexp1.html">revival paper</a> for Thompson NFAs  a while ago with a few follow-ups which build on those ideas.  Now once again backmatching is greyed from the feature matrix which means: no implementation. The project page intro states:</p>

<blockquote>The one significant exception is that RE2 drops support for backreferences and generalized zero-width assertions, because they cannot be implemented efficiently.</blockquote>

<p>When one takes a closer look at backmatching one starts to wonder what the restrictions to backmatching in trace based parsing ( TBP ) approaches to pattern matching really are?  One can ask the question also with a more positive intent: what cases of backmatching are compliant with TBP? We&#8217;ll find there are quite a lot. Some are obviously falling apart like exotic applications of regexps for solving NP-complete problems like 3-SAT - but could it be in the end that only esoteric applications of backmatching are excluded from TBP? The reader might be the final judge.</p>

<h3>General backmatching</h3>

<p>When we think about backmatching in regular expressions we might have expressions like this</p>


<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;"> <span style="color: #ff0000;">&quot;... (P) ... <span style="color: #006699; font-weight: bold;">\1</span>&quot;</span></pre></div></div>


<p>in mind where (P) defines a simple pattern and \1 refers to the <em>value</em> of the matched pattern. So there is a functional relationship between (P) and \1.</p>

<p>Actually this perspective is a little simplistic in the general case. Consider the following simple regexp:</p>


<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;"><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#91;</span>ab<span style="color: #009900;">&#93;</span><span style="color: #339933;">*</span><span style="color: #009900;">&#41;</span>b<span style="color: #339933;">*</span>\<span style="color: #0000dd;">1</span></pre></div></div>


<p>Here the match of  <span style="font-family: Courier New,Courier,monospace;">([ab]*)</span> depends on what \1 will match but it is also highly ambiguous. If we match the following string</p>

<p>s = &#8220;bb&#8221;</p>

<p>the first b can be matched by <span style="font-family: Courier New,Courier,monospace;">([ab]*)</span> and the last &#8220;b&#8221; by <span style="font-family: Courier New,Courier,monospace;">\1</span> but the whole string can also be matched by <span style="font-family: Courier New,Courier,monospace;">b*</span>.</p>

<p>Here is another more complicated example</p>


<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;"><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#91;</span>ab<span style="color: #009900;">&#93;</span><span style="color: #339933;">*</span><span style="color: #009900;">&#41;</span>b<span style="color: #339933;">*</span>\<span style="color: #0000dd;">2</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">*</span><span style="color: #009900;">&#91;</span>ab<span style="color: #009900;">&#93;</span><span style="color: #339933;">*</span>\1b<span style="color: #339933;">*</span></pre></div></div>


<p>It stems from Thomas Lord with whom I discussed this topic on LtU and who corrected my initial naive assumptions about backmatching. Not only depends the match of <span style="font-family: Courier New,Courier,monospace;">([ab]*)</span> on the match of \1 but also on the match of \2 which depends on the match of \1 as well. Of course \1 depends on both of the matches of <span style="font-family: Courier New,Courier,monospace;">([ab]*)</span> and <span style="font-family: Courier New,Courier,monospace;">(([ab]*)b*\2)</span>. It&#8217;s all tangled.</p>

<p>General backmatching like the one above can be used to solve NP-complete problems which exploits these tanglements and find resolutions. See <a href="http://perl.plover.com/NPC/NPC-3COL.html">this</a> article for more examples. With exponential time backtracking algorithms built into regexp engines one gets solutions for free or by magic. In some sense this is cool but if you&#8217;d write a requirements document for a regexp engine would you demand that it shall solve 3-SAT problems? If you say &#8220;yes&#8221;, show me the real world application which uses this power.</p>

<h3>Functional backmatching</h3>

<p>If we restrict backmatching to simple functional relations between (P) and \1 we can still express a broad range of practically relevant use cases. Here we give an approach to formalize those restrictions which can be checked by a regexp compiler.</p>

<p>In an expression</p>


<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;">    ... <span style="color: #009900;">&#40;</span>P<span style="color: #009900;">&#41;</span> ... \<span style="color: #0000dd;">1</span></pre></div></div>


<p>the back-reference \1 can be <em>separated</em> from P when following holds:</p>

<ol>
<li><p>P doesn&#8217;t contain back-references which means it is self contained.</p></li>
<li><p>It is possible to write the expression in the way</p></li>
</ol>


<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;">    ... <span style="color: #202020;">L</span><span style="color: #009900;">&#40;</span>P<span style="color: #009900;">&#41;</span>R ... \<span style="color: #0000dd;">1</span></pre></div></div>


<p>where L and R are left and right <em>delimiters</em> of P which means P has no characters in common with L and R. L can be empty when (P) is at the start of an expression.</p>

<p>The first condition can be checked syntactically. The second condition can be expressed using the following two equations on sets</p>


<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;"><span style="color:#800080;">2.1</span> LAST<span style="color: #339933;">-</span>SET<span style="color: #009900;">&#40;</span>L<span style="color: #009900;">&#41;</span>  <span style="color: #339933;">/</span>\ FIRST<span style="color: #339933;">-</span>SET<span style="color: #009900;">&#40;</span>P<span style="color: #009900;">&#41;</span> <span style="color: #339933;">=</span> <span style="color: #009900;">&#123;</span><span style="color: #009900;">&#125;</span>
<span style="color:#800080;">2.2</span> LAST<span style="color: #339933;">-</span>SET<span style="color: #009900;">&#40;</span>P<span style="color: #009900;">&#41;</span>  <span style="color: #339933;">/</span>\ FIRST<span style="color: #339933;">-</span>SET<span style="color: #009900;">&#40;</span>R<span style="color: #009900;">&#41;</span> <span style="color: #339933;">=</span> <span style="color: #009900;">&#123;</span><span style="color: #009900;">&#125;</span></pre></div></div>


<p>If additionally following condition is true</p>


<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;"><span style="color:#800080;">2.3</span> FIRST<span style="color: #339933;">-</span>SET<span style="color: #009900;">&#40;</span>P<span style="color: #009900;">&#41;</span> <span style="color: #339933;">/</span>\ LAST<span style="color: #339933;">-</span>SET<span style="color: #009900;">&#40;</span>P<span style="color: #009900;">&#41;</span> <span style="color: #339933;">=</span> <span style="color: #009900;">&#123;</span><span style="color: #009900;">&#125;</span></pre></div></div>


<p>R might be empty and an expression</p>


<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;">    ... <span style="color: #202020;">L</span><span style="color: #009900;">&#40;</span>P<span style="color: #009900;">&#41;</span>\<span style="color: #0000dd;">1</span> ...</pre></div></div>


<p>is permitted.</p>

<h3>End Cycles</h3>

<p>The current conditions are still to restrictive. For example regexp <span style="font-family: Courier New,Courier,monospace;">(a)\1</span> violates condition (2.3) but shall be permitted. What we really want to exclude is that \1 is adjecent to what I call a non empty <em>endcycle</em>.</p>

<p>An endcycle of P has the following definition:</p>


<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;">END<span style="color: #339933;">-</span>CYCLE<span style="color: #009900;">&#40;</span>P<span style="color: #009900;">&#41;</span> <span style="color: #339933;">=</span> FOLLOW<span style="color: #339933;">-</span>SET<span style="color: #009900;">&#40;</span> LAST<span style="color: #339933;">-</span>SET<span style="color: #009900;">&#40;</span>P<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#41;</span></pre></div></div>


<p>Take for example the regexp<span style="font-family: Courier New,Courier,monospace;">P = (a*|b|c)</span>. Here <span style="font-family: Courier New,Courier,monospace;">LAST-SET(P) = {a, b, c}</span> and  <span style="font-family: Courier New,Courier,monospace;">FOLLOW-SET({a,b,c}) = {a}</span> which means that <span style="font-family: Courier New,Courier,monospace;">a</span> is in the endcycle of <span style="font-family: Courier New,Courier,monospace;">P</span>.</p>

<p>With endcycles in mind we can weaken the conditions of (2) considerably:</p>

<p>If P has no endcycle i.e.</p>


<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;">    END<span style="color: #339933;">-</span>CYCLE<span style="color: #009900;">&#40;</span>P<span style="color: #009900;">&#41;</span> <span style="color: #339933;">=</span> <span style="color: #009900;">&#123;</span><span style="color: #009900;">&#125;</span></pre></div></div>


<p>we permit</p>


<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;">    ... <span style="color: #202020;">L</span><span style="color: #009900;">&#40;</span>P<span style="color: #009900;">&#41;</span>\<span style="color: #0000dd;">1</span> ...</pre></div></div>


<p>if the following holds:</p>


<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;">    END<span style="color: #339933;">-</span>CYCLE<span style="color: #009900;">&#40;</span>L<span style="color: #009900;">&#41;</span> <span style="color: #339933;">/</span>\ FIRST<span style="color: #339933;">-</span>SET<span style="color: #009900;">&#40;</span>P<span style="color: #009900;">&#41;</span> <span style="color: #339933;">=</span> <span style="color: #009900;">&#123;</span><span style="color: #009900;">&#125;</span></pre></div></div>


<p>If on the other hand</p>


<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;">    END<span style="color: #339933;">-</span>CYCLE<span style="color: #009900;">&#40;</span>P<span style="color: #009900;">&#41;</span> <span style="color: #339933;">!=</span> <span style="color: #009900;">&#123;</span><span style="color: #009900;">&#125;</span></pre></div></div>


<p>we permit</p>


<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;">    ... <span style="color: #202020;">L</span><span style="color: #009900;">&#40;</span>P<span style="color: #009900;">&#41;</span>R ... \<span style="color: #0000dd;">1</span> ...</pre></div></div>


<p>if the following holds:</p>


<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;">    END<span style="color: #339933;">-</span>CYCLE<span style="color: #009900;">&#40;</span>L<span style="color: #009900;">&#41;</span> <span style="color: #339933;">/</span>\ FIRST<span style="color: #339933;">-</span>SET<span style="color: #009900;">&#40;</span>P<span style="color: #009900;">&#41;</span> <span style="color: #339933;">=</span> <span style="color: #009900;">&#123;</span><span style="color: #009900;">&#125;</span>
    END<span style="color: #339933;">-</span>CYCLE<span style="color: #009900;">&#40;</span>P<span style="color: #009900;">&#41;</span> <span style="color: #339933;">/</span>\ FIRST<span style="color: #339933;">-</span>SET<span style="color: #009900;">&#40;</span>R<span style="color: #009900;">&#41;</span> <span style="color: #339933;">=</span> <span style="color: #009900;">&#123;</span><span style="color: #009900;">&#125;</span></pre></div></div>


<h3>Final  Remarks</h3>

<p>No matter how the conditions are defined it has to be granted that matching (P) is terminated before backmatching. If this isn&#8217;t checked statically during regexp compilation one can still defer checks until runtime. Much like any other dynamic check it is less clear what will happen to an expression but there isn&#8217;t much mental overhead and the implementation is kept simpler.</p>
]]></content:encoded>
			<wfw:commentRss>http://fiber-space.de/wordpress/?feed=rss2&amp;p=1441</wfw:commentRss>
		</item>
		<item>
		<title>reverb - a revival</title>
		<link>http://fiber-space.de/wordpress/?p=1454</link>
		<comments>http://fiber-space.de/wordpress/?p=1454#comments</comments>
		<pubDate>Sun, 28 Feb 2010 09:16:18 +0000</pubDate>
		<dc:creator>kay</dc:creator>
		
		<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://fiber-space.de/wordpress/?p=1454</guid>
		<description><![CDATA[Sometimes software is given up by people and you realize it only a few years later.  Large packages or libraries will inevitably be flagged as legacy and die but tiny modules might have a chance to survive and find a maintainer. I have done the latter now for reverb.py.
]]></description>
			<content:encoded><![CDATA[<p>Sometimes software is given up by people and you realize it only a few years later.  Large packages or libraries will inevitably be flagged as legacy and die but tiny modules might have a chance to survive and find a maintainer. I have done the latter now for <a href="http://www.fiber-space.de/reverb2/reverb-doc/index.html">reverb.py</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://fiber-space.de/wordpress/?feed=rss2&amp;p=1454</wfw:commentRss>
		</item>
	</channel>
</rss>
