New Feature: Contextual (a.k.a. soft) Keywords

revusky

Over the last few days, I finally implemented this new feature. It's actually not the same as this though it is related, of course. What that blog article from about 4 years ago outlines is that we now have the ability to turn on/off tokens specified in whatever lexical state -- or to get particularly, nerdy about it, that means tokens that are part of the NFA finite state machine corresponding to any given lexical state(s).

This new feature I am announcing is not the same thing as that. You can now define contextual tokens that are not part of the tokenization machinery. The main anticipated use of this is for contextual keywords that are usually just identifiers in the grammar, but in a key spot or two, are interpreted differently -- as keywords of some sort typically. Thus, for example, the contextual keyword yield (stable language feature as part of Switch expressions since JDK 14) is a keyword at the start of a Yield Statement, but everywhere else, this is just a regular identifier. Presumably, this was so that existing code could continue to run. Similar case is record. Also, there are sealed and permits. Well, simple illustration, the code:

           int yield = 1;
           boolean sealed = true;
           Record record = null;
           Case case = someCase;

The first three lines are fine, but the last one above will not compile because case is a reserved word in the Java language, and can never be used as a regular identifier, while the other three words -- yield, sealed, record -- are only keywords in a very specific context, but otherwise, as here, are just regular identifiers.

CongoCC now has a much more natural solution to this problem with minimal scaffolding. Here is the current implementation of the YieldStatement production:

      YieldStatement :
         'yield'
         =>|+2
        Expression
       <SEMICOLON>
    ;

There is no need to define any separate yield in the lexical part of the grammar. The way this works is that the string "yield" is matched as an IDENTIFIER but the contextual token yield is specified here with single quotes, so the machinery, when checking whether the token matches yield, sees that yield is a contextual keyword, so it checks whether the string yield (matched by the tokenization machienry as IDENTIFIER) matches the contextual keyword yield. It does, so it quietly recognizes it as a match for 'yield', changing the type of the lastConsumedToken from IDENTIFIER to yield. This is an ideal case for using this feature since this occurs in this one spot in the grammar and everywhere else the string yield can only be an IDENTIFIER.

Perhaps the easiest way to understand this is by contrast with the (shortcut) defining of a token type using a literal string. If you write "yield" instead of 'yield', with no corresponding mention of this in the lexical grammar, what happens is that the string "yield" is added as a regular expression to be matched. (Note, only in the "DEFAULT" lexical state, which could be a gotcha, but this feature mostly exists to support relatively small grammars that might typically only have one lexical state. So you can work up a grammar with minimal scaffolding.)

So, note that:

        "yield" <IDENTIFIER>

would NOT match the input yield yield because we now have a new token type called _yield that is NOT an identifier. Both of the yields in the input are matched as that. But... if we have:

        'yield' <IDENTIFIER>

that will match the input yield yield because the jfirst yield is matched as a yield type and the second occurrence of "yield" is an IDENTIFIER. Or, to put it another way, it is only matched as the yield type when we specifically mention it. This is a subtle, but crucial difference. Another example is that the choice:

         <IDENTIFIER> 
         | 
         'yield'

the second choice is necessarily unreachable, since if the coming input was "yield", it would be matched as the first choice, IDENTIFIER, so it never reaches the second choice. So the only feasible way of writing this (at least to behave as you presumably would want it to) is:

          'yield'
          |
         <IDENTIFIER>

because that is the only way you can ever identify this as the *soft keyword" yield. In the opposite order, it will always match IDENTIFIER. So, note the different semantics as compared to the double-quoted "yield".

So another little gotcha with these soft keywords is that they only work if there is a more general pattern that will match the string. An interesting case is the new modifiers introduced to support sealed/unsealed type declarations, sealed and non-sealed. The first one sealed can be (and is now) replaced with this kind of contextual keyword. We can just write 'sealed' and it will match this as an IDENTIFIER, but since 'sealed' is a contextual keyword, in the right context, it will check for the string match and then realize that this is not an IDENTIFIER (in this precise spot!) and change its type to sealed. But with non-sealed this does not work, because non-sealed will not be matched by the IDENTIFIER pattern (or any more general one). (It is utterly beyond me why non-sealed was not defined as non_sealed, since that would surely spare people some headaches. But, okay, it provides some additional challenges, no?) In fact, the input "non-sealed" will match as the sequence of tokens IDENTIFIER MINUS IDENTIFIER. So, if the machinery has the IDENTIFIER token "non", it is not (via any magical extra lookahead) going to realize that this is the first part of non-sealed. Sorry. It just won't work. So again:

For this kind of soft identifier to work, it must be matched by some existing, more general pattern.

That is an interesting little detail, but the fact remains that about 98% of the time anybody would want to use this feature for its intended use-case, there is some sort of general IDENTIFIER pattern that will match it. (By the way, I intend to put in some warnings for these cases that aren't matchable but that is unimplemented as yet.) Or, another detail would be that if you wanted to use a foreign word, like 'привет' or '你好' as a contextual keyword that is fine, since these are valid Java identifiers and would be matched as such, but if you are in a language where an identifier is only (["a"-"z", "A"-"Z", "0"-"9"])+ just ASCII alphanumeric, then that won't match the aforementioned keywords (in Russian and Chinese respectively) so, again, no dice...

But... I daresay that in most existing real world use-cases, the contextual keyword does match the identifier definition in that language. In fact, that is precisely why you want to define it as contextual so that outside of that specific context, it is just matched as an IDENTIFIER or whatever more general category!

In the current implementation, the assumption is that the contextual keyword is a valid Java identifier, which is usually the case, and simpler to implement since the TokenType definition just uses the string itself, which must be a valid Java identifier. That said, it is a tad inflexible and I have already run into a case where it doesn't work! Python 3.10 introduced two soft keywords, match and case and, with the current (somewhat crude) implementation, I could use this disposition for the first one but not the second one. Because case is a reserved word in Java, and the TokenType enum can't use that as an element. It has to get munged into _case CASE or something like that. Details, details... So I will have to do something about that... Meanwhile, for about 98% of the cases where you want something like this, the feature, as implemented, works pretty well. I have it going in internal use and it has already led to quite a significant simplification and improvement in readability where it is used.

Again, the case where you want most to use this is where you have a number of contextual keywords that are just used in a single spot (or two or three) and elsewhere they are just regular identifiers. Some grammar specifications could have literally hundreds of these things and your activeTokenTypes has an equal number of slots for this and you are turning them on/off as needed... well, it has long seemed to me that this is not a good solution and this newer scheme should really be better. So I finally buckled down and implemented it.

Enjoy.

adMartem

revusky
Related to this, when I was re-applying my commits from slave to master today, I ran across my changes (PR205) to the TokenType enum to add a ignoresCase boolean to it. As it conflicted with the contextual changes, I removed by change for now (and the related change in RegularExpression). I added those (sorry for burying it in the cardinality changes, with which it is unrelated) in order to allow by COBOL IDE to be able to use the literal token values for coloring. Since the keyword token defs for COBOL all specify IGNORE_CASE, I needed to get the literal (flattened) value, and it was missing because of that. It seems to me that it is necessary also for the contextual keywords feature if it is also case-insensitive. So I will defer to your changes in that area if you wish. If you want me to add it later, or not at all, just let me know. For now I will not muck with that.

revusky

adMartem
Well, at the moment, the contextual keywords feature is NOT in master. I do intend to re-implement it there because I think it does round things out fairly nicely. What I think I'm going to do this time is have it be something you declare (or can declare) in the lexical grammar, like:

       CONTEXTUAL :
           < VAR : "var" >
            |
           < RECORD : "record">
           ...
       ;

So there will be the same facilities like specifying token subclasses and also case sensitivity and so on. and probably you can specify a lexical state switch the same way as with a regular token.

But the difference is that a contextual token exists outside the regular tokenization machinery. So, as I said, the feature relies on the token being matched by an existing pattern -- typically a contextual keyword is matched by the pattern for IDENTIFIER, and that would be almost all usage anyway. So, if you declare the contextual token in the lexical grammar, any normal or existing way of referring to it would be available, and, as with the current situation, if you refer to it using a string literal, it wouldn't matter whether it is double-quotes or single quotes as long as it is specified (as CONTEXTUAL or TOKEN in the lexical grammar.) I mean, that would override whether you used single or double quotes in whatever various grammar productions.

I guess the thing is that this would be confusing and, generally, if people use quoted strings, we want the single-quoted ones to be for contextual tokens and the double-quoted ones to be for regular tokens. But what is specified in the lexical grammar would take priority. But we can emit warnings, I figure, and most people would use the single quote vs double quote in the approved manner just to avoid the warnings!

BUT... if the token is not declared, then yeah, it is contextual or not based on whether you used single quotes or double quotes. And then you would have to be consistent, I guess, The funny thing is that this convention is the opposite from the way that Python's PEG grammar works. There, a double-quoted token is contextual (like "match", "case", and "type") and the rest of the tokens, the "normal ones, are specified with single-quotes. So we'd be doing the opposite. Oh well.... Finally, which is which is arbitrary anyway, and the ability to declare tokens implicitly with string literals has existed in the old JavaCC from the very beginning (I think so....) and it was always with double-quotes. I allowed one to use single-quotes in JavaCC21 at some point, but there was never a semantic difference really. In the lexical grammar, the only thing was really the practical advantage of being able to write '"""' instead of "\"\"\"". Things like that. But I think in the lexical grammar, the single quotes vs double quotes will continue to not mean anything if the token is actually declared. It's when you implicitly declare them in your syntactic grammar that there is the difference.

As for declaring case sensitivity, that is a rare case. If the grammar as a whole is case insensitive, then any token you declare -- whether regular or contextual -- is also case insensitive by default. And vice versa. There is the possibility that certain keywords are case insensitive while the grammar as a whole is case sensitive. Or vice versa. That actually is the case with PHP, come to think of it. As I recall, the grammar as a whole is case sensitive but the keywords case-insensitive. (Or is it the other way round?) That's a real case, so it exists, but mostly as a corner case, I have to think.

So, anyway, I think that if you can get your cardinalities patch into master and all the tests work, then yeah, go for it. I'm about to go to sleep here, so...

revusky

adMartem
The soft keywords feature is working in "master" now. You can (optionally) declare them like other tokens, as in:

      CONTEXTUAL #Keyword : 
           <YIELD : "yield">
           |
          <RECORD : "record">
      ;

Etc. You can declare them as case-insensitive, say, even if your grammar overall is case-sensitive. In the above, the #Keyword annotation means that the tokens are instantiated as instances of that class.

And you can also just "declare" them on the spot, like here or here or here.

And it's working for Python and C# as well! It all seems fine. There may be some glitches here and there. Probably there are.

As I said, the "soft" keyword has to be matched as some more general pattern or it just doesn't work, which is why non-sealed could not be replaced with a contextual keyword, since the Java tokenizer will match that as "non" followed by "-" followed by "sealed". But if you do have a more general pattern in your grammar that would match that, then it should work fine.

I think I'm going to delete the "slave branch" pretty soon. You don't have any objection, do you?

adMartem

revusky No objection here. Can't wait to use this new feature. It will be much more scrutable than the way I do this now (in my grammars). Thanks!

adMartem

Would you like for me to try and synchronize C# and Python with the cardinality changes now, or should I wait? I don't think there is anything in them that my inexperience with Python or C# would render insurmountable, but I was waiting for a period of quiescence before attempting it [They might take me awhile]. It mostly affects the ParserProductions and LookaheadRoutines templates, as you doubtless know.

BTW, unless you object, I would like to make a change to generate a TokenType as MY_TOKEN("my_token", false, true) in the case of a non-contextual but defined as [IGNORE_CASE]. It is needed if using the TokenType to drive IDE syntax coloring, and was a change, more or less, that I had planned on making before you implemented a superset of it. As here.

revusky

adMartem
Well, I guess this would be a good time to do it. I'm probably going to take a bit of a break on this. I really should. I took a bike ride yesterday for the first time in a month or so and I was just so unbelievably tired afterwards. I need to get back in shape.

By the way, have you noticed the recent changes in error reporting? That is another thing where the Python and C# are not up-to-date. I was thinking (or hoping) that Vinay would take a look at that. But if you want to, it would be quite welcome.

We really need to do something to get the word out, in particular to the Python and Dotnet communities. What we have really does not work so badly really. A bit more of a push with the raw code injections and we could get them in synch feature-wise. There is the issue that the parsers are MUCH slower. I expected that for Python but it is surprising how much slower the CSharp parsers are. But if we could get somebody involved who is interested in profiling and figuring out how to improve things.

revusky

adMartem BTW, unless you object, I would like to make a change to generate a TokenType as MY_TOKEN("my_token", false, true) in the case of a non-contextual but defined as [IGNORE_CASE]. It is needed if using the TokenType to drive IDE syntax coloring, and was a change, more or less, that I had planned on making before you implemented a superset of it. As here.

Oh, regarding this... well, all of this is in a pretty fluid state so if you have some ideas for doing it a bit differently, it's probably all okay. In the other languages, it turned out that I had to do things quite differently because the enums don't have the same functionality. In C#, I guess they are more like syntactic sugar around integer constants, while in Java, the enums really are objects and you can give them fields and constructors and have them implement interfaces... And, of course, EnumSet and EnumMap are very super-efficient implementations of Set and Map where the enum objects act as the keys. It really is something that the Java people did pretty well IMHO.

vsajip

revusky I was thinking (or hoping) that Vinay would take a look at that

I would have liked to, but I have family commitments which will take up a lot of my time until around mid-September, Happy for John to have a crack at it, and I can provide any pointers if at all needed. When I have time I might be able to look at the C# profiling, too.

adMartem

vsajip
Thanks Vinay. Right now I will try and get the cardinality changes made for Python and C# so they don't drift too far apart. I'll definitely appreciate pointers when I get stuck (as I likely will).

revusky

vsajip
Hi Vinay. Nice to hear from you. I understand that you just don't have the free time. But... if you have just a moment to put into this, could you get the build of the C# parser in C# working? It must be a trivial issue. Maybe even just a few minutes. Like if you run:

dotnet build cs-csharpparser/org.parsers.csharp.csproj

from the examples/csharp directory, you get a message like this:

home/revusky/projects/congo/examples/csharp/cs-csharpparser/Lexer.cs(12,46): error CS0234:
The type or namespace name 'ppline' does not exist in the namespace 'org.parsers.csharp' 
(are you missing an assembly reference?) [/home/revusky/projects/congo/examples/csharp/cs- 
csharpparser/org.parsers.csharp.csproj::TargetFramework=net6.0]
/home/revusky/projects/congo/examples/csharp/cs-csharpparser/Lexer.cs(11,54): error CS0234: 
The type or namespace name 'ppline' does not exist in the namespace 'org.parsers.csharp' 
(are you missing an assembly reference?) [/home/revusky/projects/congo/examples/csharp/cs- 
csharpparser/org.parsers.csharp.csproj::TargetFramework=net6.0]

The problem is obviously that the CSharp parser is using this other little parser ppline that it cannot find. What does one have to do to get this working? I figure it's just a minute for you. And this says a lot about my total ignorance of dotnet!

I do hope you can find a moment to get this working. In any case, I guess we'll be back in touch in mid-September! I think you'll find there will be a lot of exciting new stuff by that point.

vsajip

revusky I figure it's just a minute for you.

Perhaps it's not quite that simple. I had a very quick look, and at the top of the generated file Parser.cs for the line directive parser, I find:

using org.parsers.csharp.ppline;
using System;
using System.IO;
if (args.Length == 0) {
    Console.WriteLine("Usage: <program> " + "files or directories with files to parse.");
}
long start = System.DateTime.Now.Ticks;
int successes = 0;
int failures = 0;
foreach (string arg in args) {
    if (arg.EndsWith(".default") && File.Exists(arg)) {
        try {
            Parser p = new Parser(arg);
            p.ParseModule();
            Console.WriteLine("parsed " + arg + " successfully.");
            successes++;
        }
        catch(ParseException e) {
            Console.WriteLine("Problem parsing file: " + arg);
            Console.WriteLine(e);
            failures++;
        }
    }
    else foreach (var f in Directory.EnumerateFiles(arg, "*.default", SearchOption.AllDirectories)) {
        try {
            Parser p = new Parser(f);
            p.ParseModule();
            Console.WriteLine("parsed " + f + " successfully.");
            successes++;
        }
        catch(ParseException e) {
            Console.WriteLine("Problem parsing file: " + f);
            Console.WriteLine(e.ToString());
            failures++;
        }
    }
}
Console.WriteLine("Successfully parsed " + successes + " files.");
Console.WriteLine("Failed on: " + failures + " files.");
long duration = (System.DateTime.Now.Ticks - start) / 10000;
Console.WriteLine("Duration: " + duration + " milliseconds.");
namespace org.parsers.csharp.ppline {
  // classes which I would expect to be there
}

I haven't looked into where that preamble (before the namespace declaration) comes from, but before your big refactoring, it wasn't there, and the Parser.cs file started with the namespace declaration. So it's possibly a transpilation issue, or something else,. but it doesn't look as if the preprocessor parser will compile, and it is needed by the C# parser for conditional compilation etc.

adMartem

revusky
I wondered how you made the changes to Python for this, and now I see what you did. Probably those indicator methods should be somehow associated with the TokenType or at least in the Lexer so that the Parser is not needed to get to them. But then I know so little about Python that I wouldn't have anything useful to say at this point. But I agree, I've thought for many years that the Java Enum is one of their true achievements. Unlike the 16-bit architecture and tacked on generics with erasure.

revusky

adMartem I wondered how you made the changes to Python for this, and now I see what you did.

Well, I have no doubt that there's a better way of doing it. I was getting some severe tunnel vision and just really wanted to get it working and I saw a way I could get it working and went with that...

Probably those indicator methods should be somehow associated with the TokenType or at least in the Lexer so that the Parser is not needed to get to them.

I suppose you're right. It shouldn't be necessary to have an instance of the parser to check the info. So, by all means, feel free to refactor it, certainly if you see a need to.

Some various things have occurred to me about ASSERT/ENSURE and fault-tolerant and so on and I was meaning to write it up today, but I guess I'll do it tomorrow.

revusky

vsajip I haven't looked into where that preamble (before the namespace declaration) comes from, but before your big refactoring, it wasn't there, and the Parser.cs file started with the namespace declaration.

Well, yes, I added the preamble up top in the Parser.cs.ctl template but that is really just the equivalent (AFAICS) of having a main() method but with less ceremony. You can have some statements up top that effectively serve as a main, an entry point into the program.

I really don't think the presence of that preamble is what is causing the failure to build. In fact, I just deleted the preamble on top of ./cs-pplineparser/Parser.cs and its presence/absence does not seem to affect whether the thing builds. (It still doesn't.) I think the issue is that somehow the .cs-csharpparser needs to say something or other for the pplineparser part to be rolled into the "assembly", as they call it.

I mean, it's not surprising that it doesn't work. I don't see anywhere where we are indicating that cs-csharpparser uses pplineparser, so...

But I honestly don't know what is needed! There are some other C# issues that are also completely befuddling me, but one thing at a time. I'd love to get the csharp parser in csharp working just so that we can benchmark it etcetera. Also, we'd have all the major languages that we are supporting with parsers in Java,C#, and Python.

vsajip

revusky that is really just the equivalent (AFAICS) of having a main() method but with less ceremony. You can have some statements up top that effectively serve as a main, an entry point into the program.
This is not a C# idiom, and I don't think it will be considered professional quality code. The norm is to have a Main method, as in Java.

It looks like your refactoring is a bit simplistic, the C# preprocessor parser is assumed to have a ParseModule class, because of this bit:

#var rootProduction = "Module"
#if extension == "cs" || extension == "java"
   #set rootProduction = "CompilationUnit"
#elif extension = "lua"
   #set rootProduction = "Root"
#endif

It might be OK as a quick hack, but it's too wedded to our built-in grammars (and perhaps not wedded enough, as the C# preprocessor is built-in too). In practice a user could be using any name for the top-level production (if there is one). IMO that whole preamble needs to go into a Main method, and perhaps a configuration value should be added to indicate a top-level production in a way that the user can control.

As the old test harnesses that you ripped out built everything correctly, that logic is perhaps what's needed to be replicated in the current build procedure.

Here's the error message:
.../congocc/examples/csharp/cs-pplineparser/Parser.cs(16,15): error CS1061: 'Parser' does not contain a definition for 'ParseModule' and no accessible extension method 'ParseModule' accepting a first argument of type 'Parser' could be found (are you missing a using directive or an assembly reference?) [.../congocc/examples/csharp/cs-pplineparser/org.parsers.csharp.ppline.csproj::TargetFramework=net6.0]

revusky

Oh, and I added a comment about a separate issue that I raised here.

revusky

vsajip It might be OK as a quick hack,

Well, that's all it is is a quick hack. In fact, it's quite ghastly. For example, the way it uses the default lexical state in the grammar as a way of deducing the file extension (and the root production) is truly nasty! I would love for you to fix that up!

but it's too wedded to our built-in grammars (and perhaps not wedded enough, as the C# preprocessor is built-in too). In practice a user could be using any name for the top-level production (if there is one). IMO that whole preamble needs to go into a Main method, and perhaps a configuration value should be added to indicate a top-level production in a way that the user can control.

Well, sure. The entry production should be properly parametrized. What you see there now is just the result of me wanting to get the thing to work. So, by all means, feel free to fix it up.

On the Java side, what I think I'm going to do is consolidate all of these test harness programs, like JParse.java and CSParse.java and so on, and just have a single test harness that is a bit more generalized and can be used for all of them, and that can be in the congocc.jar file. I'm thinking that the ability to run over a directory of source files with whatever parser could be built into the jarfile, so you just could do:

       java -jar congocc.jar jparse <directory>

and get some instant gratification that way.

Or the other possibility is to roll these things up as custom ant tasks, which is certainly possible. There is also the ability to define macros. So, maybe just leveraging Ant in a more sophisticated manner could be the right idea. Or... we could look for another build tool.

But, anyway, all those test harness programs like JParse.java and CSParse.java etc. are basically the same thing obviously. They just run over a directory and get the files and feed them to the parser. All these copy-paste-modify versions of the same thing look a bit silly. And, of course, what I did with C# is just another hack.

So I have been thinking about how to organize things better. I think that ant, as retro as it is (you're supposed to use Maven or Gradle apparently, if you want to be one of the cool kids) surely has the needed machinery to organize things better. Really, so far I have been using it in an extremely bloody-minded brain-dead manner. So just leveraging ant's feature set better, to have some ant macros, in some file like commonantmacros.xml that we import and so on.. The whole thing is gradually getting unwieldy. Though, that said, up to this point, I've never spent all that much time mucking with ant files. (And I don't really intend to start!)

And then, of course, the other possibility would be to find some other build tool. I was looking at this thing called Bazel. I guess it comes from Google. But I honestly don't kow whether it's worth bothering with. Maybe Ant is not so bad. Not that I like ant so much, and it's biggest downside is that it is horribly verbose because it's all in XML. Something more or less like ant with a more terse DSL, i.e. that doesn't use XML would maybe just be fine.

As the old test harnesses that you ripped out built everything correctly, that logic is perhaps what's needed to be replicated in the current build procedure.

Well, I don't want to get contentious over this. But the truth of the matter is that I just never got it working locally, in particular the C# tests you put in place. I guess it does work, since it did work with Github's CI, but I never managed to get the whole IronPython+Dotnet thing going locally and I never understood why. When I tried to run it locally, I just always got these messages that I couldn't make any sense of. I honestly don't know how many people ran across the project, tried to get it working, and failed and then gave up. (Surely a non-zero number.)

I guess I am not absolutely opposed to IronPython in principle, I mean having some examples that use it, but I would say that first we always needed clear examples of just running it all using the more conventional toolset. And the fact remains, I'm pretty sure that most people don't know what ipy is, so there would be a need to document the whole thing better for people. You'd need to write some README explaining what IronPython is and why it's appealing to use it as a dot-net scripting language that can script these generated parsers and so on. I don't think you quite understand how opaque the whole thing was for most people.

And the fact (and facts are stubborn things, as the adage says) remains that we just don't have any users (that I know of!) on the non-Java languages. It doesn't work too badly really, though not as well as in Java obviously, but we're trying to remedy that. But we need some tutorial material -- even on the Java side, but certainly on the C#/Python side. Things that could serve as articles somewhere, I think.

The whole thing with ANTLR has sort of galvanized me, I think. I just look at that thing and I think we've beaten the living crap out of those people technically -- at least in terms of having a usable, practical tool. Now, okay, if you need a parser in some language that we're not supporting, Javascript or whatever, then maybe one has to go with ANTLR, fine. But if you just wanted to do something in Java, CongoCC is just a vastly better tool. Actually, if you just wanted to do something in Java, and it was a choice between ANTLR and the legacy JavaCC, I think the practical choice could well be the old JavaCC. At least that, for all its limitations and glitches, tends to generate something reasonably performant and just somehow is not so strangely opaque as ANTLR. But, of course, why would anybody use the old JavaCC when CongoCC exists?