Sorry for the tardy reply, Jon. I see that your recent changes affect this behavior, so I will reverse my temporary reversal of your change and see what happens to my .5%.
Lexical State Swiching, SCAN and odd error messages
revusky Well, I don't think they fix my 0.5%, but it is possible they might have changed the failures. But from my sampling of them they appear to be unchanged. This is the change I can make that makes everything work again.
The manifestation of the errors is somewhat perplexing, as some seem to affect productions that should not be sensitive to lexical state transitions, but they all stem from when the program uses a COBOL declaration that makes ,
the decimal point rather than the default .
. This is effected by a lexical state change on the BNFProduction for decimal numeric literals, and all of the affected COBOL syntax errors are in statements containing decimal numeric literals, so I suspect that is the connection. I've tried to reproduce the behavior with a simple test case, but so far I've not been successful. Maybe the preceding will trigger some thought for you, however.
- Edited
adMartem I think I see the problem. Will provide a fix shortly. I spoke too soon. I thought the problem was that BNFProduction
is a subclass of Expansion
, but didn't override the Expansion
version of startsWithLexicalChange
thereby causing its potential true
value when lexicalState
is non-null to be missed. But that does not appear to fix my parser.
adMartem I think the problem is that in a BNFProduction
with a lexical state specified, the ExpansionSequence
(s) comprising the production's ExpansionChoice
child check only their respective child nodes, and not the parent of the ExpansionChoice
itself, which could, in fact, be the state change of a BNFProduction
. So even if the BNFProduction
correctly reports startsWithLexicalChange it is not considered when the ExpansionChoice
provides its result. I added code to override startsWithLexicalChange
in BNFProduction (which is an Expansion
and it seemed like it should not return false
in the case of a production-level state change), but that didn't work. My hypothesis above would explain that, I think, because the logic in ExpansionChoice
doesn't check its parent (which could be a production). If true, however, I'm not sure putting the check for a BNFProduction
parent in the ExpansionChoice
version of the method is any better than the check that used to be in the ExpansionSequence
. Furthermore, what confuses me about check in ExpansionSequence
is how it used to work at all considering that the parent of the ExpansionSequence
would in this case be an ExpansionChoice
, not a BNFProduction
, wouldn't it? I am perplexed. I guess I will try adding a check in ExpansionChoice
and see if my hypothesis is correct and, if so, the question is if the old check in ExpansionSequence
is better than the rabbit hole I'm digging to "correctly" fix it. I'm inclined to say, "no", the old check should be put back with a comment as to why it is there for this special case.
Just as a status report, this problem still exists for my parser, and I have to apply the small reversion above to make it run correctly. I still haven't been able to come up with a test case that reproduces it.
hylkevds
Hi, I turned my attention back to this issue you raised and I think it really is fixed now. You see, I had applied a fix before but I later realized that it broke some other things. Specifically, the generated Java parser was now failing on a handful (I think five) files in the src.zip. And also there were the issues that John Bradley (adMartem) was reporting. So, as I said in a separate note, I ended up reverting the change that fixed your issue.
The truth is that I never really figured out why my previous fix broke other things, but I finally just identified where the problem must be and just tore up that code snippet and rewrite it a bit differently and it seems to be okay now. Your issue seems to be fixed (again) and the generated Java parser is parsing everything in the JDK src.zip (the latest one, JDK 22 now). I'll be happy if John reports back that everything seems okay now with his issues.
I think that, for the most part, the bug you were hitting had an easy workaround. You see, there is a bit of tricky logic where the code generation tries to get away with just a single-token lookahead vs. generating a predicate method. And the single-token lookahead was spurious in the cases where you had to switch lexical states and the logic on that wasn't quite right. But, as I think you realized, you could always force it to generate a lookahead routine by simply putting in an explicit SCAN or up-to-here in the key spot.
But I think it's fixed now and it should work without any tricks. But, by all means report back on it.
Thanks.
adMartem Just as a status report, this problem still exists for my parser, and I have to apply the small reversion above to make it run correctly. I still haven't been able to come up with a test case that reproduces it.
I just applied (or re-applied) the above-referenced snippet. It looks fine and, to be honest, if that was there before, I don't know why I removed it. Maybe I was experimenting with different things and I removed that and then forgot to put it back in. I'm not really sure. I rewrote a bit of code in NonTerminal.java and, as I explained to hylkevds, it all seems to be working now, but I honestly am not sure why. I finally just deleted a certain key bit and rewrote it and somehow that got the gremlins out, it seems...
I will be quite curious to know what issues you now have with this. (I hope none!) This business of handling lexical state switches correctly is a tad tricky, it seems. Of course, the legacy JavaCC tool simply does not address this problem at all, so....
- Edited
revusky
I can confirm that my issue appears to be fixed by your latest commit. Thanks! I will let you know if there is any contrary evidence when I run the entire COBOL regression tests, but I don't expect any. Just to be clear, this means that the unmodified main tip of CongoCC will be used for the COBOL release for Java 17, 21, and 22 I am currently working on and will be tested on those platforms and on MacOS, Linux, and Windows.
adMartem Just to be clear, this means that the unmodified main tip of CongoCC will be used for the COBOL release for Java 17, 21, and 22 I am currently working on and will be tested on those platforms and on MacOS, Linux, and Windows.
I'm thinking that we should really put up a testimonials page. You (or people generally, really) could contribute some testimonial about how they are using CongoCC and saving all kinds of time (and money) implementing whatever it is... AND... most importantly of all, having fun!!!
I mean, I'm guessing that it's not top secret that you're using this and you wouldn't mind writing some positive note about it. Though not just you... whoever is willing to put in a few words, of course.
Absolutely! I'd be happy to use it.
adMartem
Oh, by the way, do make sure to test against the absolutely latest code. In the last couple of days, I really did some review of the lookahead generation logic and nailed certain bugs. Now, there was a pretty flagrant bug in the lexical state handling, where it could generate a faulty lookahead when the lookahead involved a lexical state switch. That might have been something you were hitting. So, there's that. But, just yesterday, like less than 24 hours ago, I came across another bug in the lookahead generation logic and fixed that.
Actually, I started writing here a description of the bug (now fixed) but then I figured it would be better to write a top-level post about it...
Can you please add tests for the bugs you've found, so that we can test for regressions in the future?
- Edited
revusky
I have tested against the latest version, and it seems to be fine. I assume I can now use ENSURE
to denote what were previously ASSERT
ions annotated with #
and use ENSURE ASSERT
together if necessary (but I don't think I ever need to do that). This is a great change, BTW. I think it is much clearer than the effective (mis)use of ASSERT
before. Thanks.
adMartem
Aha! You are definitely like one of the students sitting in the front row, eager to raise his hand and make a point, not sitting in the back row napping!
So, yes, you saw the new feature before I announced it! I just put up an announcement here: https://parsers.org/announcements/revisiting-assertions-introducing-the-ensure-keyword/