Sorry for the tardy reply, Jon. I see that your recent changes affect this behavior, so I will reverse my temporary reversal of your change and see what happens to my .5%.

revusky Well, I don't think they fix my 0.5%, but it is possible they might have changed the failures. But from my sampling of them they appear to be unchanged. This is the change I can make that makes everything work again.

The manifestation of the errors is somewhat perplexing, as some seem to affect productions that should not be sensitive to lexical state transitions, but they all stem from when the program uses a COBOL declaration that makes , the decimal point rather than the default .. This is effected by a lexical state change on the BNFProduction for decimal numeric literals, and all of the affected COBOL syntax errors are in statements containing decimal numeric literals, so I suspect that is the connection. I've tried to reproduce the behavior with a simple test case, but so far I've not been successful. Maybe the preceding will trigger some thought for you, however.

    adMartem I think I see the problem. Will provide a fix shortly. I spoke too soon. I thought the problem was that BNFProduction is a subclass of Expansion, but didn't override the Expansion version of startsWithLexicalChange thereby causing its potential true value when lexicalState is non-null to be missed. But that does not appear to fix my parser.

      adMartem I think the problem is that in a BNFProduction with a lexical state specified, the ExpansionSequence(s) comprising the production's ExpansionChoice child check only their respective child nodes, and not the parent of the ExpansionChoice itself, which could, in fact, be the state change of a BNFProduction. So even if the BNFProduction correctly reports startsWithLexicalChange it is not considered when the ExpansionChoice provides its result. I added code to override startsWithLexicalChange in BNFProduction (which is an Expansion and it seemed like it should not return false in the case of a production-level state change), but that didn't work. My hypothesis above would explain that, I think, because the logic in ExpansionChoice doesn't check its parent (which could be a production). If true, however, I'm not sure putting the check for a BNFProduction parent in the ExpansionChoice version of the method is any better than the check that used to be in the ExpansionSequence. Furthermore, what confuses me about check in ExpansionSequence is how it used to work at all considering that the parent of the ExpansionSequence would in this case be an ExpansionChoice, not a BNFProduction, wouldn't it? I am perplexed. I guess I will try adding a check in ExpansionChoice and see if my hypothesis is correct and, if so, the question is if the old check in ExpansionSequence is better than the rabbit hole I'm digging to "correctly" fix it. I'm inclined to say, "no", the old check should be put back with a comment as to why it is there for this special case.

      2 months later

      Just as a status report, this problem still exists for my parser, and I have to apply the small reversion above to make it run correctly. I still haven't been able to come up with a test case that reproduces it.

        hylkevds
        Hi, I turned my attention back to this issue you raised and I think it really is fixed now. You see, I had applied a fix before but I later realized that it broke some other things. Specifically, the generated Java parser was now failing on a handful (I think five) files in the src.zip. And also there were the issues that John Bradley (adMartem) was reporting. So, as I said in a separate note, I ended up reverting the change that fixed your issue.

        The truth is that I never really figured out why my previous fix broke other things, but I finally just identified where the problem must be and just tore up that code snippet and rewrite it a bit differently and it seems to be okay now. Your issue seems to be fixed (again) and the generated Java parser is parsing everything in the JDK src.zip (the latest one, JDK 22 now). I'll be happy if John reports back that everything seems okay now with his issues.

        I think that, for the most part, the bug you were hitting had an easy workaround. You see, there is a bit of tricky logic where the code generation tries to get away with just a single-token lookahead vs. generating a predicate method. And the single-token lookahead was spurious in the cases where you had to switch lexical states and the logic on that wasn't quite right. But, as I think you realized, you could always force it to generate a lookahead routine by simply putting in an explicit SCAN or up-to-here in the key spot.

        But I think it's fixed now and it should work without any tricks. But, by all means report back on it.

        Thanks.

        adMartem Just as a status report, this problem still exists for my parser, and I have to apply the small reversion above to make it run correctly. I still haven't been able to come up with a test case that reproduces it.

        I just applied (or re-applied) the above-referenced snippet. It looks fine and, to be honest, if that was there before, I don't know why I removed it. Maybe I was experimenting with different things and I removed that and then forgot to put it back in. I'm not really sure. I rewrote a bit of code in NonTerminal.java and, as I explained to hylkevds, it all seems to be working now, but I honestly am not sure why. I finally just deleted a certain key bit and rewrote it and somehow that got the gremlins out, it seems...

        I will be quite curious to know what issues you now have with this. (I hope none!) This business of handling lexical state switches correctly is a tad tricky, it seems. Of course, the legacy JavaCC tool simply does not address this problem at all, so....

          revusky
          I can confirm that my issue appears to be fixed by your latest commit. Thanks! I will let you know if there is any contrary evidence when I run the entire COBOL regression tests, but I don't expect any. Just to be clear, this means that the unmodified main tip of CongoCC will be used for the COBOL release for Java 17, 21, and 22 I am currently working on and will be tested on those platforms and on MacOS, Linux, and Windows.

            adMartem Just to be clear, this means that the unmodified main tip of CongoCC will be used for the COBOL release for Java 17, 21, and 22 I am currently working on and will be tested on those platforms and on MacOS, Linux, and Windows.

            I'm thinking that we should really put up a testimonials page. You (or people generally, really) could contribute some testimonial about how they are using CongoCC and saving all kinds of time (and money) implementing whatever it is... AND... most importantly of all, having fun!!!

            I mean, I'm guessing that it's not top secret that you're using this and you wouldn't mind writing some positive note about it. Though not just you... whoever is willing to put in a few words, of course.

            adMartem
            Oh, by the way, do make sure to test against the absolutely latest code. In the last couple of days, I really did some review of the lookahead generation logic and nailed certain bugs. Now, there was a pretty flagrant bug in the lexical state handling, where it could generate a faulty lookahead when the lookahead involved a lexical state switch. That might have been something you were hitting. So, there's that. But, just yesterday, like less than 24 hours ago, I came across another bug in the lookahead generation logic and fixed that.

            Actually, I started writing here a description of the bug (now fixed) but then I figured it would be better to write a top-level post about it...

              Can you please add tests for the bugs you've found, so that we can test for regressions in the future?

              6 days later

              revusky
              I have tested against the latest version, and it seems to be fine. I assume I can now use ENSURE to denote what were previously ASSERTions annotated with # and use ENSURE ASSERT together if necessary (but I don't think I ever need to do that). This is a great change, BTW. I think it is much clearer than the effective (mis)use of ASSERT before. Thanks.