adMartem
Hmm, well, what you're reporting is pretty messed up and I definitely have to look into this. Now, there are a couple of problems at the current juncture -- perhaps most fundamentally, that it's not even specified anywhere exactly how these things are supposed to work! It's defined by implementation at the moment really -- which sometimes is not so terrible, but this is all complex enough that it really needs to be properly specified. So that should change at some point soon. Or alternatively, one could say that I have a more or less clear notion of how this should work, but that's just in my own head...
But what you're reporting... there seem to be a couple of different bugs. I'm not sure that some of this was not introduced recently because I'm trying to rewrite all this code because it's just not written very clearly.
Now, in terms of:
A : "(" X ")" ASSERT ~("+") =>|| | "(" Y ")" "+" ;
It looks pretty clear that the only up-to-here marker that should have any effect is the top-level one, right after the ASSERT. The up-to-here marker that we hit drilling down into X
should really be superseded by the one at the top-level of the expansion itself. I think that's common-sensical. So, I mean generally, if you have:
Foobar : Foo =>|| Bar ;
and that is used in some choice construct somewhere, like:
Foobar Baz Bat
|
SomethingElse
Here is my mental model of this (that needs to be documented AND implemented properly) which is that the way we're going to decide between the first and second option is that we scan into Foobar which, in turn means that we scan into Foo, and then if we successfully scan past Foo, we hit the up-to-here and we go for option 1. And otherwise, we pass onto SomethingElse, option 2. Right? (By the way, note that we're not going to pay any attention to any up-to-here inside of Foo
, which is what the nestingLevel
thingy is about.)
BUT... here is how I think it should work (though apparently it does not, or not always...) which is that it only respects the up-to-here in Foobar
if that is the first subexpansion in the sequence. At some point, I had to seriously grapple with this and I decided that in the above, if there is an up-to-here in Baz, we don't pay attention to that. So the first nonterminal in the sequence is kind of treated especially. (All of this has to do with some serious (at least I think serious) thought about the principle of least surprise etc.)
But, if we had something specified in the enclosing sequence itself, that should override anything inside the nonterminal Foobar. So, if we had:
Foobar Baz =>|| Bat
|
SomethingElse
then the up-to-here at the higher-level (between Baz and Bat) should override anything inside Foobar
. And that's the same thing as your example, where the up-to-here after the assertion should cause any up-to-here we ran into before that to be disregarded, because the higher-level (or more outer-level) expansion specifies the up-to-here point and has priority. That's how it should work according to my own mental model (which, at some point soon, should stop being just my own mental model and start being something that is set out somewhere!) And similarly, if you just have a SCAN
that should also override anything inside a nonterminal.
So, similarly:
SCAN ~("foo") => Foobar Baz Bat
|
SomethingElse
the up-to-here in Foobar
should be invalidated because we've got an explicit syntactic lookahead in the enclosing expansion sequence that has (or should have) priority. (Again, this is my mental of how things should work. I think the above does at least work!)
But you see that's what this scanToEnd stuff is about, which is the conditions under which we actually are going to "respect" the up-to-here marker. If scanToEnd is true, it means we are scanning to the end of the nonterminal (in this case Foobar
) and ignoring any up-to-here marker that we encounter because we have a more outer level up-to-here or scanahead that is taking priority. That's what scanToEnd
is about, or what it's supposed to do. And also the nestingLevel
is about the fact that, at some point, I decided (and it's not properly specified anywhere) that we're only going to pay attention to nested (in a nonterminal) up-to-here markers if they are just one level deep. That might be arbitrary and hard to justify on some pure theoretical grounds, but it just seems like it gets too hairy otherwise. I want to be able to write:
Foo | Bar | Baz
very cleanly and have the up-to-here markers inside of those non-terminals respected but we're not going any deeper. Like if Foo starts with another non-terminal Bat, we're ignoring any up-to-here inside Bat. Otherwise, I think it kind of gets into a pandora's box in terms of enabling people to write tricky, obfuscated code. I think so... And I've thought about it and I don't think the practical case for it is very strong. So if we had that choice above of Foo|Bar|Baz
and we have:
Foo : X Y =>|| "z" ;
Then the choice above will start by scanning for X Y
inside the Foo but if we just had:
Foo : X Y Z ;
it will NOT (and I think I've decided under any circumstance) respect any up-to-here or scanahead inside of X because... well... even if it's maybe not so justifiable theoretically, it's just giving people means to write very tricky things that... And regardless, any up-to-here inside of Y will also be irrelevant (doubly so) because it doesn't even start the sequence. (Of course, any up-to-here inside of Y could be relevant somewhere else, as long as we start the choice with Y and we're only one level deep.) But in this case, any up-to-here inside of Y would be irrelevant for multiple reasons actually. Y is not the start of the sequence, we have an up-to-here at the higher level superseding anything in Y anyway, and we're two levels deep, if we are coming here from the Foo|Bar|Baz
expansion.
So I think it could be stated something like this: The nested up-to-here inside a nonterminal of a sequence should only be relevant if:
1. It's not superseded by an up-to-here (or explicit scanahead) in a more outer-level expansion
2. If the nonterminal is at the very start of the enclosing sequence
3. We're only 1 level of nonterminal nesting in.
The example you give that is broken is really pretty serious because based on any of the above, the up-to-here inside of Y
should be disregarded. So we should hit the ASSERT and then stop. (I mean, assuming the assertions succeeds. If the assertion fails, we're out of there regardless....)
So, look, where I'm at now on all this is that I am becoming quite aware that there are some serious problems here and I've basically decided that I am basically going to rewrite all this. And it's not a big deal because the key code that determines all this (both Java and on the template level) is not very big. And actually, I decided I was going to rewrite this stuff not too long ago, because I added these features over time and sort of kludged things in somehow... To be honest, I don't know whether what you are running into now was quite this broken before or this is more a consequence of my rewrite being unfinished. Actually, I'm working on this now in a fork here: https://github.com/revusky/javacc21 that is currently 11 commits ahead of the upstream repository. So if you want to work against the absolutely latest version of the code, you can use that.
That's all I can say about this for now, I guess. As for when I'll have this fixed, I'm kind of engrossed in another problem that is even outside of hacking code, but probably it won't take too long.
One kinda funny aspect of this is that I reread what I just wrote and it occurs to me how much more complex JavaCC21/Congo is than the legacy JavaCC tool, with up-to-here and assertions and such. But the main way to get all this straightened out is really for a demanding user (you in this instance) to show up and start making noise about these various cases that don't seem to work quite right.
In that vein, did you ever read this blog article? https://javacc.com/2020/10/28/a-bugs-life/
1