revusky
Actually, as a bit of a follow-up on this topic, it turned out that I didn't really need this feature. I thought I did, but I did things another way. It's actually kind of instructive. Rust has these struct literals, I guess they're called, that look like:
foo {
field1 : value,
etc....
}
So a big problem is the potential ambiguity, since the BNF for an IfExpression looks like if Expression Block ... sort of thing, so you can see the potential for ambiguity in the above. Well, the basic rule is that if you write:
if foo {...}
then, regardless of whether the foo {...} could be parsed as a struct literal, that interpretation is disallowed. The following { is taken to be the start of the block that is part of the IfExpression. Now, in terms of implementing this, various possibilities occur to one, most of which don't quite work. (At a certain point, you end up feeling like you're trying to fit a square peg into a round hold!)
The most promising approach is something like this. You define a sort of outer ConditionalExpression dummy production, looks like:
ConditionalExpression#void : Expression;
and you use that dummy production in the appropriate spots, so you have something like:
IfExpression : "if" ConditionalExpression Block ["else" Block] ;
The fact that you enter the Expression production from the dummy ConditionalExpression allows you to disallow the struct literal construct based on that. So, at the top of StructExpression, you could have:
`ENSURE ~\...\ConditionalExpression`
That doesn't quite work (though it mostly does maybe.) The problem is that the struct literal could be "protected" by parentheses (or brackets or whatever). So you could write:
if foo(bar {...}) {doSomething()}
In the above, the real deal is that bar {...} should be parsed as a struct literal because it is "protected" by the parentheses. (Java and other C-derived languages don't have this problem because you have to put parentheses around the condition in an if statement... And Python requires a colon after the condition.) But then you have the problem that it could be something like:
if foo && !bar {...}
then we don't want bar {...} to be parsed as a struct expression. But again, if it was in parentheses (or part of an array expression and thus "protected by square brackets) then it would be okay, but the above is ambiguous, so we are supposed to parse this as the condition ending after bar and then the block starts right after that. Well, it should be parsed as if we had written:
if (foo && !bar) {...}
That is actually how you would have to write it in Java, say. Well, I guess I'll end the suspense. The solution (it really seems to work!) can be seen here:
Actually, I'll paste in the key code since there is a tendency for the line numbers to slip around. Here is the key predicate method written in Java:
INJECT PARSER_CLASS : {
private boolean isInConditionButNotParenthesized() {
ListIterator<NonTerminalCall> it = stackIteratorBackward();
while (it.hasNext()) {
// NB. These strings are interned so we can just
// use identity comparison.
String caller = it.next().productionName;
if (caller == "ConditionExpression") return true;
if (
caller == "ParentheticalExpression"
|| caller == "CallParams"
|| caller == "ArrayExpression"
|| caller == "BaseExpressionPostfix"
|| caller == "ClosureExpression"
|| caller == "BlockExpression"
) {
return false;
}
}
return false;
}
}
What this code does is that it walks up the call stack and if it finds ConditionalExpression it knows it shouldn't enter the StructExpression production. But there's a niggle, which is that if we are in ParentheticalExpression or CallParams, say, and we hit that before we hit ConditionalExpression, it means that the intervening delimiters (whether parentheses or brackets or braces) are "protecting" from the ambiguity, so yes, we can parse a StructExpression, but if not, we have to interpret the block as being part of the enclosing IfExpression (or while orfor` or whatever it happens to be. But this predicate, which isn't really very long, is key to making this work.
But, you see, the funny thing is that I have no way of expressing this using the contextual predicates (as currently implemented). I would need a much more powerful sort of XPath-ish sort of thingy. And, actually, it was easy enough to hand-code this method that I wonder whether having some more powerful mini path-matching language would ever be worth it. I don't think there is usually any need to write something much more complicated than the above method.
The funny thing about all this is that there is some sort of Rust spec that explains the rules about these sorts of things. But it's very hard to understand. Finally, I just decided to hack away until I could get it to parse all these things. I had it working before, but only via the hack of having a full unbounded lookahead on the struct expressions, you know:
StructExpression =>||
But that is very problematic for a variety of reasons. Just for starters, its not really totally correct, though it did work. I got it to parse everything in my 10,000 file test suite. But it was not quite correct...
I think that what I implemented above is effectively the same as the Rust spec (what there is of one) describes. But I arrived at it pragmatically, gradually narrowing in on what I had to implement to get it to parse this stuff -- without unbounded lookahead! Or another way of putting it is that I didn't really understand the rules until, by a process of trial and error, I converged on the above solution. I'd do it one way and have half a dozen parsing failures. Then I would redo it a bit differently and those files that were failing before were parsing, but then I'd have another half a dozen files that were successfully parsing before but now failing!
Well, anyway, in closing, I think that the Rust grammar is pretty accurate at the moment.
Actually, I was intending to write an in-depth blog article on this whole topic of using the ability to walk the call stack using this as an example. And I will, I think. This message is just a sort of first pass draft.