adMartem
Well, this all sounds pretty good. Frankly, it's hard for me to get excited about a report that the Congo generated parser is 35% slower, say. I mean the original JavaCC is such a primitive, simplistic tool really. That the generated code might run a bit faster would hardly even be surprising. OTOH, 50x difference is really pretty unacceptable. And that, actually, is what the speed difference with ANTLR typically is. It really seems that it's something like that. But, anyway, to use the legacy JavaCC because the code it generates runs 30,40, or 50% faster would just be a terrible trade-off, I'm pretty sure.
Today I was playing around with original JavaCC, just because I wanted to see what code it generated for various cases. And honestly, I had forgotten what a bloody-minded simplistic tool it really is. I think people credit it as being something much more sophisticated than it really is because the whole parser generation space is enveloped in this sort of obfuscated jargon.
I guess what does make the parser generation space kind of challenging is just that the tool is a code generator, so when you hit a bug, you're pretty much always one extra degree of separation away from the problem, as compared to when you just write code directly by hand. So a bug manifests itself in code that was generated and to find the bug, you have to trace back to the problem where it was generated, not the code itself. The problem with the original JavaCC project is that it always eschewed the use of templates. When the code is generated with a series of println
trying to find any bug is like... When you're generating from a template, the template still kind of resembles the output. It's still challenging, but you get used to working with the templates. So, you know, you see things like: https://github.com/javacc21/javacc21/blob/master/src/ftl/java/ParserProductions.java.ftl or https://github.com/javacc21/javacc21/blob/master/src/ftl/java/LookaheadRoutines.java.ftl which are really the most nitty-gritty templates that generate the parser/lookahead code.
But, I mean, just how much more clearly you can express certain things when you have things like up-to-here notation and assertions and then lexical state and token activation/deactivation that actually works in conjunction with lookahead. Oh, and contextual predicates...
Of course, the problem has been that the whole thing is less solid than I thought, because some of these features do interact in screwy ways at the moment, though I've been gradually beating it into shape. I think the last issue you brought up is fixed. In my defense, I would point out that probably if one just restricted oneself to using the features that already existed in legacy JavaCC, the tool is pretty solid. In that original feature set, there are probably fewer bugs in Congo than in the original JavaCC. The bugs we're hitting (and that I am in the process of squashing...) relate to features that simply never existed in the original JavaCC. (And never will!)