A problem (revealed by) ASSERT in lookahead

adMartem

It took me a while to track this one down, and even longer to reduce it to a test case. The actual situation in the grammar was deeply nested with many red herrings along the way, and the inability to debug anything in Java with line numbers above 64K further complicated the effort, but, finally, here it is.

A : "(" X ")" ASSERT ~("+") =>|| | "(" Y ")" "+" ;

X : Y ;

Y : ( "Y" | "y" ) =>|| ;

In the A rule, the intent is to not match "(y)+", of course. What seems to happen is that the check$Y lookahead in the scan$ for the first A expansion is invoked with scanToEnd false and a nesting level of 1. That causes it to set the remaining lookahead to 0 before returning. When the top-level scan is returned to, the 0 remaining lookahead causes it to prematurely return a successful result before the ASSERT predicate is even checked. To slightly misuse the trope, "hilarity ensues" (at least in my parser).

revusky

adMartem

Hmm, well, what you're reporting is pretty messed up and I definitely have to look into this. Now, there are a couple of problems at the current juncture -- perhaps most fundamentally, that it's not even specified anywhere exactly how these things are supposed to work! It's defined by implementation at the moment really -- which sometimes is not so terrible, but this is all complex enough that it really needs to be properly specified. So that should change at some point soon. Or alternatively, one could say that I have a more or less clear notion of how this should work, but that's just in my own head...

But what you're reporting... there seem to be a couple of different bugs. I'm not sure that some of this was not introduced recently because I'm trying to rewrite all this code because it's just not written very clearly.

Now, in terms of:

 A : "(" X ")" ASSERT ~("+") =>|| | "(" Y ")" "+" ;

It looks pretty clear that the only up-to-here marker that should have any effect is the top-level one, right after the ASSERT. The up-to-here marker that we hit drilling down into X should really be superseded by the one at the top-level of the expansion itself. I think that's common-sensical. So, I mean generally, if you have:

Foobar : Foo =>|| Bar ;

and that is used in some choice construct somewhere, like:

Foobar Baz Bat
 |
SomethingElse

Here is my mental model of this (that needs to be documented AND implemented properly) which is that the way we're going to decide between the first and second option is that we scan into Foobar which, in turn means that we scan into Foo, and then if we successfully scan past Foo, we hit the up-to-here and we go for option 1. And otherwise, we pass onto SomethingElse, option 2. Right? (By the way, note that we're not going to pay any attention to any up-to-here inside of Foo, which is what the nestingLevel thingy is about.)

BUT... here is how I think it should work (though apparently it does not, or not always...) which is that it only respects the up-to-here in Foobar if that is the first subexpansion in the sequence. At some point, I had to seriously grapple with this and I decided that in the above, if there is an up-to-here in Baz, we don't pay attention to that. So the first nonterminal in the sequence is kind of treated especially. (All of this has to do with some serious (at least I think serious) thought about the principle of least surprise etc.)

But, if we had something specified in the enclosing sequence itself, that should override anything inside the nonterminal Foobar. So, if we had:

Foobar Baz =>|| Bat 
 |
SomethingElse

then the up-to-here at the higher-level (between Baz and Bat) should override anything inside Foobar. And that's the same thing as your example, where the up-to-here after the assertion should cause any up-to-here we ran into before that to be disregarded, because the higher-level (or more outer-level) expansion specifies the up-to-here point and has priority. That's how it should work according to my own mental model (which, at some point soon, should stop being just my own mental model and start being something that is set out somewhere!) And similarly, if you just have a SCAN that should also override anything inside a nonterminal.

So, similarly:

   SCAN ~("foo") => Foobar Baz Bat
   |
  SomethingElse

the up-to-here in Foobar should be invalidated because we've got an explicit syntactic lookahead in the enclosing expansion sequence that has (or should have) priority. (Again, this is my mental of how things should work. I think the above does at least work!)

But you see that's what this scanToEnd stuff is about, which is the conditions under which we actually are going to "respect" the up-to-here marker. If scanToEnd is true, it means we are scanning to the end of the nonterminal (in this case Foobar) and ignoring any up-to-here marker that we encounter because we have a more outer level up-to-here or scanahead that is taking priority. That's what scanToEnd is about, or what it's supposed to do. And also the nestingLevel is about the fact that, at some point, I decided (and it's not properly specified anywhere) that we're only going to pay attention to nested (in a nonterminal) up-to-here markers if they are just one level deep. That might be arbitrary and hard to justify on some pure theoretical grounds, but it just seems like it gets too hairy otherwise. I want to be able to write:

  Foo | Bar | Baz

very cleanly and have the up-to-here markers inside of those non-terminals respected but we're not going any deeper. Like if Foo starts with another non-terminal Bat, we're ignoring any up-to-here inside Bat. Otherwise, I think it kind of gets into a pandora's box in terms of enabling people to write tricky, obfuscated code. I think so... And I've thought about it and I don't think the practical case for it is very strong. So if we had that choice above of Foo|Bar|Baz and we have:

 Foo : X Y =>|| "z" ;

Then the choice above will start by scanning for X Y inside the Foo but if we just had:

 Foo : X Y Z ;

it will NOT (and I think I've decided under any circumstance) respect any up-to-here or scanahead inside of X because... well... even if it's maybe not so justifiable theoretically, it's just giving people means to write very tricky things that... And regardless, any up-to-here inside of Y will also be irrelevant (doubly so) because it doesn't even start the sequence. (Of course, any up-to-here inside of Y could be relevant somewhere else, as long as we start the choice with Y and we're only one level deep.) But in this case, any up-to-here inside of Y would be irrelevant for multiple reasons actually. Y is not the start of the sequence, we have an up-to-here at the higher level superseding anything in Y anyway, and we're two levels deep, if we are coming here from the Foo|Bar|Baz expansion.

So I think it could be stated something like this: The nested up-to-here inside a nonterminal of a sequence should only be relevant if:

   1. It's not superseded by an up-to-here (or explicit scanahead) in a more outer-level expansion
  2. If the nonterminal is at the very start of the enclosing sequence
  3. We're only 1 level of nonterminal nesting in.

The example you give that is broken is really pretty serious because based on any of the above, the up-to-here inside of Y should be disregarded. So we should hit the ASSERT and then stop. (I mean, assuming the assertions succeeds. If the assertion fails, we're out of there regardless....)

So, look, where I'm at now on all this is that I am becoming quite aware that there are some serious problems here and I've basically decided that I am basically going to rewrite all this. And it's not a big deal because the key code that determines all this (both Java and on the template level) is not very big. And actually, I decided I was going to rewrite this stuff not too long ago, because I added these features over time and sort of kludged things in somehow... To be honest, I don't know whether what you are running into now was quite this broken before or this is more a consequence of my rewrite being unfinished. Actually, I'm working on this now in a fork here: https://github.com/revusky/javacc21 that is currently 11 commits ahead of the upstream repository. So if you want to work against the absolutely latest version of the code, you can use that.

That's all I can say about this for now, I guess. As for when I'll have this fixed, I'm kind of engrossed in another problem that is even outside of hacking code, but probably it won't take too long.

One kinda funny aspect of this is that I reread what I just wrote and it occurs to me how much more complex JavaCC21/Congo is than the legacy JavaCC tool, with up-to-here and assertions and such. But the main way to get all this straightened out is really for a demanding user (you in this instance) to show up and start making noise about these various cases that don't seem to work quite right.

In that vein, did you ever read this blog article? https://javacc.com/2020/10/28/a-bugs-life/

revusky

adMartem

Okay, well, I think this is working now. It should be working as I described in the last message.

Well, I've come across some other things that were not really working correctly, but I think the issue you raised above is resolved. (Though please check...)

adMartem

revusky
There seems to be a regression (or a correct sanity check that I fail). This grammar illustrates the error message in question.

FooOrBar : Foo | => Bar;
Foo : SCAN 1 "foo";
Bar : SCAN 1 "bar" ;

MaybeFooThenBar : [ Foo =>|| ] Bar ;

In the real grammars, it seems to object whenever there is a numerical scan nested within an up-to-here in a higher level expansion as far as I can tell.

revusky

adMartem
Oops, that's a mistake I made. I added a sanity check that is well-founded, in principle, like it will complain if you write something like:

SCAN 2 Foo Bar =>|| Baz

because the explicit numerical lookahead of 2 and the up-to-here should not be in the same expansion, but the bug was that it will get the numerical lookahead from initial nonterminal (Foo in this case) as long as Foo starts the expansion. So it should be perfectly permissible to have:

Foo Bar => || BazThat,

and to have:

Foo: SCAN 2 "foo" "bar" "baz";

The effect should just be that the SCAN 2 is ignored because we have an up-to-here in the outer expansion which overrides it. If the up-to-here was not in the calling expansion, i.e. just:

 Foo Bar Baz

then the SCAN 2 in foo would be used, i.e. we check for "foo" followed by "bar" to enter the expansion.

And, of course, if there is no numerical lookahead or an up-to-here anywhere, we just lookahead one token which is to check for "foo".

This is all the result of thinking somewhat hard about how things should work in terms of common-sense principle of least surprise sorts of considerations.There's that, and just economy of expression, DRY. If you have the lookahead in the Foo production above, you don't need to put LOOKAHEAD(2) in front of every expansion that starts with Foo, it's specified in one place. But anyway, the sanity check was expressed incorrectly and should now be fixed. You can update and try it.

adMartem

revusky
Yep, that fixed it. Thanks. Now, however, I get a host of new warnings of the form "The expansion inside this (...)? construct can be matched by the empty string so it is always matched. This may not be your intention." I understand what it is trying to tell me, but in the case of the grammar the non-terminal at the choice point is like this one:

MnemonicNameReference :
    SCAN {isMnemonicName(getNextToken())}# => CobolWord | FAIL
;

Is it not true that the FAIL alternative should suppress the aforementioned warning (since it cannot choose the non-terminal unless there is a CobolWord present? And just now, looking at this, it would seem the FAIL is unnecessary and the semantic predicate should also prevent this from ever matching the empty string, right?

adMartem

adMartem
Actually, I tried it without the FAIL and it didn't give the warning. In one case it was an error message (couldn't match the following expansions) and that too went away. So it seems the FAIL is what is causing the problem. I would guess the message is because with the FAIL alternative causes the non-terminal to always be chosen at a decision point, but it may not consume any tokens, not realizing that the FAIL will never be chosen because of implicit lookahead failure in that case.

revusky

adMartem
WEll, a FAIL doesn't consume any tokens, but this is a spurious warning, I think. I guess I subtly rewrote certain things and now it's giving this spurious warning in these spots. Well, I have to look at this.

Well, just bear with me. We'll just gradually squash all these little bugs, not to worry.

revusky

adMartem
Well, the problem was that it was warning about issues like:

(Foo)*

when Foo can match empty input so you're going to get into an infinite loop. But in a case like:

Foo : "bar" | FAIL;

that is a spurious warning. If you had (Foo)* and Foo was:

  Foo : "bar" | ["baz"] ;

so it potentially matches empty input, then having it inside a repeating (....)* is problematic, because Foo always succeeds and you get an infinite loop...

Well, anyway, try it again. I think it's okay now!

adMartem

revusky
I think I still get the warnings and one error on the following:

...
AdvancingPhrase :
  ( <BEFORE> | <AFTER> ) [ <ADVANCING> ]
  ( <PAGE>
  | MnemonicNameReference =>||
  | ( Identifier =>|| | IntegerConstant | NumericFigurativeConstant) [( <LINE> | <LINES> )]
  | <TO> [ <LINE> ] (Identifier =>|| | IntegerConstant) [ [<ON>] <NEXT> <PAGE>] //TODO: implement this
  )
;
...

The (error) message is Error: /Users/jmb/Development/Local_Repositories/p3cobol/src/main/congocc/p3cobol.ccc:7974:5:This expansion can match the empty string.The following 2 expansions can never be matched.
The error goes away if I remove the FAIL, but the rest of the warnings on other expansions in different productions remain.
I double-checked that I was building from the latest pull from Javacc21 [5d66d704].

adMartem

Here's another one (more exciting) than previous one.
I have the following snippet in the grammar:

...
CombinableCondition :
    SimpleCondition =>|| | AbbreviatedRelationCondition =>|| | <LPARENCHAR> ASSERT ~(AbbreviatedCondition <RPARENCHAR> ArithmeticOperator) AbbreviatedCondition <RPARENCHAR> =>||
;
AbbreviatedRelationCondition :
    (   
            RelationalOperator ArithmeticExpression
        |   [ <NOT> ] RelationalOperator ArithmeticExpression
        |   [ <NOT> ] ArithmeticExpression =>||
//      |   ZERO/ZEROS/ZEROES shadowed by ArithmeticExpression()
        |   SignCondition =>||
    )
;

RelationalOperator :
(
    [ <IS> ] [ <NOT> ]  
        ( 
            SCAN 3 =>   <GREATER> [ <THAN> ] <OR> <EQUAL> [ <TO> ]
        |               <MORETHANOREQUAL>
        |   SCAN 3 =>   <LESS> [ <THAN> ] <OR> <EQUAL> [ <TO> ]
        |               <LESSTHANOREQUAL>
        |               <GREATER> [ <THAN> ]
        |               <MORETHANCHAR>
        |               <LESS> [ <THAN> ]
        |               <LESSTHANCHAR>
        |               (<EQUAL>|<EQUALS>) [ <TO> ]
        |               <EQUALCHAR>[ <TO> ]
        |               <NOTEQUALCHAR>
        |   SCAN {allowJas()}# => <JAS_NE>
        |   SCAN {allowJas()}# => <JAS_EQ>
   )
) =>||
;
...

The input string looks like this : NOT 10 AND 9 AND = 10 ... at the point that CombinableCondition is entered.
What happens is that the AbbreviatedRelationCondition up-to-here scan works fine and passes over the first two choices and succeeds on the third (correct) choice. Then when the AbbreviatedRelationCondition is entered it correctly skips the first choice but (incorrectly) selects the second one based on the first set rather than scanning.

I will try and reduce this to a test case if you need it, but I thought I would let you know right away with this fragment in hopes it is sufficient.

adMartem

adMartem
The funky first two choices are due to the (legal) syntax of "NOT NOT EQUAL 10" in the context of this production. Ugh!

adMartem

adMartem I'm beginning to think this is my (brain's) problem, perhaps masked in earlier CongoCC versions. Is it reasonable to assume that the lookahead will succeed at the same point as the selected non-terminal, or is it the case that I should have resolved the problem with an up-to-here in the 2nd choice of AbbreviatedRelationCondition:
... | [ <NOT> ] RelationalOperator =>|| ArithmeticExpression? I.e., my up-to-here scan was at too high a level.
... ( a little later) ...
Now I'm sure I was wrong-headed when I assumed the behavior I originally described. Short of memoization of the scan to make expansion choices always consistent with lookahead I don't see how it could be implemented the way I assumed. So now the mystery is how it ever worked that way (which it did). I'll have to go back and see what was generated before.

adMartem

adMartem
This is typical of the remaining warnings:

...
WriteCatena :
  RecordName [ <FROM> (Identifier =>|| | Literal) ]
  [ AdvancingPhrase ]
  [ [ At =>|| ] ( <END_OF_PAGE> | <EOP> ) =>|| StatementList ]
  [ <NOT> [ At =>|| ] ( <END_OF_PAGE> | <EOP> ) =>|| StatementList ]
  [ <_INVALID> [ <KEY> ] StatementList ]
  [ <NOT> <_INVALID> [ <KEY> ] StatementList ]
  [ <END_WRITE> ]
;
...
At :
    SCAN {isContextSensitiveWord("at")}# => CobolWord | FAIL
;
...

Warning: /Users/jmb/Development/Local_Repositories/p3cobol/src/main/congocc/p3cobol.ccc:7964:5:The expansion inside this (...)? construct can be matched by the empty string so it is always matched. This may not be your intention. Warning: /Users/jmb/Development/Local_Repositories/p3cobol/src/main/congocc/p3cobol.ccc:7965:11:The expansion inside this (...)? construct can be matched by the empty string so it is always matched. This may not be your intention. occurred at the "At" non-terminal.
When I remove the FAIL the error and warnings all go away.
I guess I probably don't need the up-to-here on the At reference since without the FAIL the SCAN will still be allowed. When I did these I was under the impression that I had to create a choice point in order to add the predicate.

revusky

adMartem

Well, I think there is still a bug in the logic for that warning. I have to look at this more closely. When the final choice in a choice construct is FAIL, then...

Well, not to worry... we'll get this stuff right. In any case, that it's only a warning means that you can disregard it. But the logic of this needs to be fine-tuned.

I do have to say that it is great to have somebody really using all these things in praxis. (Besides the project internally, that is...) Because that really is about the only way to get all this stuff working right.

Well, one aspect of this (that you surely realize) is that the language for expressing the grammar (meta-language to be pretentious...) in Congo/JavaCC21 is really vastly more powerful and expressive than what there is in the original JavaCC. So it is much harder to get everything absolutely right and probably, as a practical matter, the only way to do it is to have noisy, demanding end-users. (Like you.)

revusky

adMartem

I think this is fixed. It was a subtle bug in the sanity check. There is this general problem that the sanity check stuff is meant to catch buggy code, but if the sanity check itself is buggy... I guess that's also a paradox of unit tests and all that. Sure, it's a good idea, you can catch regressions and so on, but if the test itself is buggy....

Though it's maybe a tangent... I myself don't believe in unit tests that much, because I tend to find that if a system is sufficiently complex, the bugs tend to manifest themselves in the conjunction of more than one feature. So unit testing each feature individually can give one a false sense of confidence. And, in any case, I would put more stock in full functional tests than unit tests. We have at least 4 pretty major functional tests of the system, which are the Java, Python, CSharp grammars, and the rebuild/retest of the tool itself, which is written in itself!

Of course, you're hitting these bugs because you are using combinations of things that are not used in the aforementioned functional tests.

revusky

Well, I think the problem you're running into (or maybe it's just one of them) is that I changed (thinking I could get away with it) the way it works as regards using any scanahead specified in a non-terminal.

The way it was before, if you wrote:

   A B C
   |
   D E F

and let's say that B contains an up-to-here, that would be used as long as the preceding expansions were potentially empty, i.e. consumed no tokens. Potentially. So, A could be:

 A: ["foo" | "bar" | "baz"];

which is* potentially* empty. or if the first expansion in the choice above was:

   [A] B C

which amounts to the same thing...

The way it's implemented now, the elements before the nonterminal (say B in this case) must consume no tokens. Since [A] is potentially non-empty, then any up-to-here in B is ignored. But, in principle, you can still have:

     ASSERT {condition1()} {doSomething()} B C 
     |
      ....

And it would use the up-to-here in B, because the elements preceding B do not consume any tokens. (Granted, the code block that is second in the sequence could explicitly call consumeToken() but that's getting entirely too tricky. We do just assume that a Java code block does not consume any input.)

But anyway, the way it was expressed before was that the things preceding it potentially consumed no input. And I surely was thinking about this at some point. I was probably thinking in terms of constructs like:

    Modifiers TypeDefinition

where Modifiers (public, private, static etc.) is potentially empty so maybe the up-to-here is in the TypeDefinition. So there may well be a use-case for this (though none of my internal use was using this).

But finally (very recently) I decided that this was possibly a bit too tricky (not so much to implement as to just document!) and figured that I could get away with changing this so that the nonterminal has to the be the first non-empty sub-expansion in the sequence. I knew this was changing behavior but considered it unlikely that it would affect anybody and also I figured that I could get away with doing this now.

And if you really want to get dirty with the details a bit, this is where this is implemented: https://github.com/javacc21/javacc21/blob/master/src/java/com/javacc/core/NonTerminal.java#L67

So, the current "spec" is that the up-to-here (or SCAN) in a NonTerminal is used if:

 1. The NonTerminal in question is the first non-empty sub-expansion in the sequence
 2. There is no up-to-here (or SCAN) in the enclosing sequence that would have priority.
 3. We're not more than 1 nesting level deep in terms of calling non-terminals or sub-expansions

It could be worth noting that points 1 and 2 are determined at build-time, while point 3 is at run-time, when the parser is actually being run. (Worth noting if you want to develop a conceptual model of how the thing actually works...)

Anyway, the question now is basically:

Could you live with the above semantics?

adMartem

revusky
I think I see what you are saying. Saying it a little differently,
an up-to-here is (recursively) effectively "hoisted" to an enclosing sequence containing its non-terminal if and only if:

The NonTerminal in question is the first non-empty sub-expansion in the enclosing sequence
There is no up-to-here (or SCAN) in the enclosing sequence that would have priority.

Additionally, when processing the grammar with actual input:

The parser is no more than 1 nesting level deep in terms of accepting non-terminals or sub-expansions.

Is that correct? And, if so, can I also assume that any explicit up-to-here in an expansion is always applied at that level of lookahead/acceptance. I.e., the previous rules only apply to "hoisted" up-to-here action, not explicit up-to-here notation.

I can live with that.

The metaphysical problem I have with up-to-here is coming up with a way to think about it while writing productions. But that's my problem, I guess.

Finally, I would assume from this that the correct way to refactor the snippet I gave would be:

CombinableCondition :
    SimpleCondition =>|| | AbbreviatedRelationCondition | <LPARENCHAR>  AbbreviatedCondition <RPARENCHAR> ASSERT ~(ArithmeticOperator) =>||
;
...
AbbreviatedRelationCondition :
    (   
            RelationalOperator ArithmeticExpression
        |   [ <NOT> ] RelationalOperator =>|| ArithmeticExpression
        |   [ <NOT> ] ArithmeticExpression =>||
//      |   ZERO/ZEROS/ZEROES shadowed by ArithmeticExpression()
        |   SignCondition =>||
    )
;
...

i.e., no up-to-here in CombinableCondition (unnecessary), up-to-here on 2nd choice in AbbreviatedRelationCondition (necessary even though RelationalOperator has up-to-here). 1st choice RelationalOperator needs no up-to-here, as it is hoisted to this sequence.

Also, am I correct in assuming that lookahead is independent of acceptance in that the sequence that is checked in lookahead is not guaranteed to be the sequence that is accepted after the choice is taken?

adMartem

revusky
Thanks for your kind words. I know exactly how you feel. I'm glad I tripped over Javacc21 and your humorous narratives. 😃

And now for something completely different...

FNul : [F0] [F1] [F2] [F3] [F4];
Fs : F0 | F1 | F2 | F3 | F4 | FNul;
FsAlt1 : => ( F0 | F1 | F2 | F3 | F4 );
FsAlt2 : ( F0 | F1 | F2 | F3 | F4 ) =>||;
FsAlt3 : F0 =>|| | F1 =>|| | F2 =>|| | F3 =>|| | F4 =>||;

F0 : "one" | "two" | "three" | "four" | FAIL;  
F1 : "one" | "two" | "three" | "four" | => FAIL ASSERT ~("five") | "five";   
F1alt : "one" | "two" | "three" | "four" | => ASSERT ~("five") FAIL | "five"; 
F2 : "eeny" | FAIL | "meany" | "miny" | "moe"; 
F3 : "eeny" | SCAN {false} => FAIL | "meany" | "miny" | "moe";
F4 : "eeny" | SCAN {false}# => FAIL | "meany" | "miny" | "moe";

revusky

adMartem Saying it a little differently,
an up-to-here is (recursively) effectively "hoisted" to an enclosing sequence containing its non-terminal

Well, yeah, if that's more comprehensible to you, given the way your brain works... (everybody is wired a bit differently, I suppose...) Though, actually, looking at what you wrote, I don't quite see the "recursively" part. We're actually not recursing, we're just going one level deep and that's it. Though, reading further, it seems that you understand that perfectly well.

And, as for:

adMartem And, if so, can I also assume that any explicit up-to-here in an expansion is always applied at that level of lookahead/acceptance. I.e., the previous rules only apply to "hoisted" up-to-here action, not explicit up-to-here notation.

Well, yes, this is the way it should work (If I understand what you're saying...) And that's how it will work, but there are currently some issues that need to be addressed, and I guess I'll have to explain that separately.

But, anyway, the specification that is outlined (and I think now is basically implemented correctly) as regards up-to-here in non-terminals, that's not absolutely written in stone yet, I guess. There are a set of things that could be open for discussion, but hopefully, we'll consider it resolved once the Congo rebranding transition is done.

adMartem I can live with that.

Well, I think it's a reasonable, pragmatic approach. Basically, a SCAN or up-to-here only applies in the expansion where it appears and the first nonterminal in a sequence is an exception, and even then, only one nesting level deep.

Well, there are also a few little details wrt parentheses solely used for grouping. If we have:

   (Bar Baz)#BarBaz Bat 
   |
   SomethingElse

then we will respect an up-to-here in Bar. The parentheses around Bar Baz exist for grouping and affect tree-building, for example, but when it comes to up-to-here, it's the same as if it was just: Bar Baz Bat.

Well, I'll write it up, I guess.

adMartem

revusky
Yes, I tend to agree regarding unit tests. Thanks for fixing the spurious warnings and errors. I had them all over my grammar, even after the previous improvement/fix. The reason (they were so plentiful) is that I have several places that use the pattern NonTerminal : SCAN {someCondition}# => CobolWord; which, of course, is not at a decision point, so, after the change was made to more strictly enforce the #1 rule these stopped working. My solution was to turn them into choices like: NonTerminal : SCAN {someCondition}# => CobolWord; | FAIL. This caused lots of the warnings and, in some cases, errors to occur. My little test sample was something I had done to try and see the effect of the interaction of ASSERT, FAIL, and semantic predicates for another purpose, but I noticed it turned into pure warnings and errors when I happened to run it along with some other tests.

Interestingly (or maybe not), in order to get rid of the hard errors I had when the false detection appeared to preclude subsequent choices, I looked at the code and it seemed like the problems stemmed from the fact that FAIL was an EmptyExpansion, and, as such, it returned true to isPossiblyEmpty(). I decided to make a one-line addition:
public boolean isPossiblyEmpty() {return false;} to the Failure INJECTion. It got rid of the errors and warnings, and nothing in my grammar seemed to be broken. Just out of curiosity, is there ever a reason for it to return true?

adMartem

revusky
As I recall, I used "recursively" because I think I noted that if you have:

A : B | "e" "c";
B : D;
D : "e" "f" =>||;

the up-to-here is effectively applied in the first choice of the A production resulting in acceptance of the input "e c" via the second choice. But maybe that has changed since I thought I noticed it. In any case, "recursive" is probably not the way to describe it. It was just what was in my head. I assume

A : B | "e" "c";
B : D "g";
D : "e" "f" =>||;

would fail (to accept "e c") as it would choose B (seeing "e") and then fail to find "f".