Is parsing Perl really impossible?

I just read this article about the apparent inability to parse Perl 5.

There is an underlying assumption that a parser has to resolve all ambiguities and derive a single parse tree from an expression ( giving a unique interpretation ). But instead of demanding the value of an expression in order to parse it the parser might derive multiple parse trees and insert them as child nodes into a new parse tree representing a conditional expression. So the parser can just inflate the parse tree structure and do its job.

Whether or not the conditional expression can be computed at runtime doesn’t have to bother the compiler developer. It is just another function call and no more dubious then an arbitrary function call in a conditional expression

if(foo())
    printf("foo() is true\n");
else
    printf("foo() is false\n");

So it makes sense to fix requirements to a parser before giving proofs of their impossibility.

  1. PPI can be used for parsing Perl.

  2. The article is about parsing for static analysis. If I’m understanding your suggestion correctly, you can’t have a function left for later evaluation if you’re trying to figure out right now what that expression is going to do.

  3. fijal says:

    Perl grammar is undecidable, that’s why parsing perl is impossible. I don’t remember details right now but if you read perl’s own parser, you’ll find a couple of places where it tries to guess what this ambigous statement means.

    cheers, fijal

  4. kay says:

    Well, it is not too uncommon that a compiler produces multiple target code variants. Think about type parameters and type specialization. JIT compilers do this even without static analysis but type feedback.

    It is just unusual that a parser keeps an expression E and produces two parse trees T1(E) and T2(E) i.e. accepts an ambiguity instead of eliminating it. However this is not too bad if a third tree T(T1(E), T2(E)) is created by the parser that represents a context switch between T1(E) and T2(E) depending on an expression F whose value V(F) will be determined at runtime ( the whatever expression in the article ). As long as one can reflect on the relevant property of V(F) which decides the switch one doesn’t even have to adapt the compiler.

    I’m actually a bit surprised that Perl, which considers context sensitivity as a feature, not a design wart ( unlike e.g. Dennis Ritchie who admitted in an interview that context sensitivity in C was a design flaw ), has a relatively conventional attitude towards parsing.

  5. kay says:

    Fijal, undecidability of parsing is exactly what the cited article discusses. It is a bit weak though on clarifying the premises. Apparently it attempts to avoid ambiguities on all costs but sometimes it might be wise to apply a work around. One trades a simpler parser for a little more generated code and runtime decision making.

  1. There are no trackbacks for this post yet.

Leave a Reply