Java 24, TokenType enum

bnaudts

Hi
I've been using CongoCC's java parser for the last year in my static analyser for Java (e2immu.org).
I basically installed it then, and simply used it; it's great!
Today I wanted to add a Java 24 feature to Java.ccc, 'var' as a type in record patterns.
When I re-generate the parser from a fresh install, I get a Token.java class that has a very different TokenType enum as compared to last year, with constants in lowercase ('record' instead of 'RECORD', etc.):

  REMASSIGN("%="), LSHIFTASSIGN("<<="), RSIGNEDSHIFT(">>"), RUNSIGNEDSHIFT(">>>"),
        RSIGNEDSHIFTASSIGN(">>="), RUNSIGNEDSHIFTASSIGN(">>>="), LAMBDA("->"), open("open", true),
        module("module", true), requires("requires", true), transitive("transitive", true),
        exports("exports", true), to("to", true), opens("opens", true), uses("uses", true),
        provides("provides", true), with("with", true), sealed("sealed", true), permits("permits", true),
        record("record", true), var("var", true), when("when", true), yield("yield", true),

Can someone help me out here? Am I missing some parameter to the code generation?

Thanks
Bart

(Not related to this question: this is the change I wanted to make to Java.ccc

TypePattern#LocalVariableDeclaration :
  {permissibleModifiers = EnumSet.of(FINAL);}#
  Modifiers 
  (
    'var' <IDENTIFIER>
    |
    Type (<IDENTIFIER>|<UNDERSCORE>)
  )
;

revusky

Hi Bart. Nice to hear from you.

The situation with the CongoCC repository is a bit confusing. Or actually, it was, until an hour ago maybe. What happened is that I got a bit confused and decided to backtrack to an earlier point and work from there. So I created a branch called "polyglot" and have been working on that branch. My intention was to put back in certain changes and "polyglot" would become the default branch. In effect, it has been the main line of development, not "main", for at least some days.

Well, what I just did was that I renamed "polyglot" to "master" and set that as the "default branch". And I renamed what was previously "main" to "slave", mostly as a little joke, but at some point fairly soon, I'm just going to delete that branch. So, I guess what you really need to do is just clone the "master" branch and apply any changes there. But that's the default branch, so there is no need to specify it.

       git clone https://github.com/congo-cc/congo-parser-generator congo

would pick it up.

Now, as for the changes you were pointing out, and finding confusing, they have to do with this: https://discuss.congocc.org/d/72-new-feature-contextual-aka-soft-keywords and you see lower case labels because that is what is generated by default when you use that new feature (which is still NOT in master!) (Yet!)

As for the newer TypePattern syntax, yes, you are right that this was not (fully) implemented in the Java grammar. I was aware of it, I guess, but I think it was a preview feature at that point, and I hadn't realized that it was now a stable language feature. Also, their explanation of this is pretty hard to read. I went and looked at the relevant point in the JLS and I had to read it several times. In principle, a type pattern is the same as a local variable declaration, except that if it is one that does not use var, the Type has to be a ReferenceType. It can't be a primitive. Also, it must declare exactly one variable in a TypePattern, whereas the local variable declaration that is just a regular statement can declare more than one variable. And, in the type pattern, it cannot have an initializer. That is what I get out of this (but correct me if I'm wrong.)

But... (as best I understand) if you use a local variable declaration with var it seems that the syntax is the same as it is elsewhere. So it can declare exactly one variable but, at least, as far as I can tell, it must have an initializer. I think so, because otherwise, how could you match against it? It has to be able to infer the type, no? But the fact is that I don't have any sample code to test against! I mostly just parse all the code in src.zip as my main-line test, but they're not using this language feature there (yet). BTW, the parser is failing on 3 files in the (early access) JDK 25 src.zip, but those are all module-info.java files. I guess there is a bit of new syntax coming in there, and it might be preview-level stuff.

So, anyway, if you check out master, it should be working. (But if it's not, by all means, tell us!)

Thanks,

Jon

bnaudts

Hi Jon
Thank you for getting back to me so quickly -- much appreciated.
The current default/master branch indeed solves the uppercase/lowercase issue I had.

However, there may still be some feature mix-up? I understand from your explanation that almost all early access JDK 25 should be parseable (which would be great), however, my tests fail on some 24 features: primitives in record patterns, 'var', and unnamed _ in record patterns.
For example,

package a.b;
class X3 {
    interface I { }
    record RI(int j) implements I { }
    void method(I i) {
        if(i instanceof RI(int k)) {
            System.out.println(k);
        }
    }
}

used to be parsed in my version of 1 year old, and cannot be parsed by the current master:

Encountered an error at (or somewhere around) input:6:28
org.parsers.java.ParseException: 
Encountered an error at (or somewhere around) input:6:28
	at org.parsers.java.JavaParser.ReferenceType(JavaParser.java:3383)
	at org.parsers.java.JavaParser.TypePattern(JavaParser.java:7092)

Also failing are

package a.b;
class X5 {
    record Point(int x, int y) {}
    enum Color { RED, GREEN, BLUE }
    record ColoredPoint(Point p, Color c) {}
    record Rectangle(ColoredPoint upperLeft, ColoredPoint lowerRight) {}
    static void printXCoordOfUpperLeftPointWithPatterns(Rectangle r) {
        if (r instanceof Rectangle(ColoredPoint(Point(int x, int _), Color c), _)) {
             System.out.println("Upper-left corner: " + x);
        }
    }
}

and

package a.b;
class X6 {
    record Point(int x, int y) {}
    enum Color { RED, GREEN, BLUE }
    record ColoredPoint(Point p, Color c) {}
    record Rectangle(ColoredPoint upperLeft, ColoredPoint lowerRight) {}
    static void printXCoordOfUpperLeftPointWithPatterns(Rectangle r) {
        if (r instanceof Rectangle(ColoredPoint(Point(var x, var y), var c), var lr)) {
             System.out.println("Upper-left corner: " + x);
        }
    }
}

Thanks

Bart

revusky

Hi, Bart.

The current master now allows primitive types in type patterns. It's a very trivial change. Well, syntactically, it's trivial. What they have to do to implement it under the hood is probably not so trivial.

Frankly, I'm puzzled by your statement that these things were parsed a year ago. I suppose that's true, but if so, it was only by mistake! You can see here how trivial the diff is. Maybe instead of it being ReferenceType before, which excludes primitive types, I had it as Type and later changed it to ReferenceType to reflect what the latest stable version of the language said. Actually, that is still what the latest stable version of the language says. Even now, the primitive type patterns is a preview feature. In JDK 25, it's on its 3rd public preview. Maybe in JDK 26 (which is an LTS release) it will be a stable language feature.

I understand from your explanation that almost all early access JDK 25 should be parseable

Well, that is not what I meant really. I do feel committed (not by any legal contract or anything, just what is in my own mind, LOL) that we should support all stable language features. But, as a practical question, my main test for whether we are supporting the current language (more or less) is just being able to parse everything that is in src.zip and there is no certainty that every stable language feature is used somewhere in there. Well, that's like 15,000 files and any language feature that is not used anywhere in there is probably pretty obscure, but as far as I know, the code in src.zip never uses any preview language features.

But actually, I just looked and we're not so bad on the preview features. The primitive types in patterns is still preview and I just implemented it. So sometimes I do implement them before they are stable, particularly, if they're easy! For example, JEP 513, the flexible constructor bodies, has been implemented for a good while now, but is just now final in JDK 25.

Out of curiosity, why are you into all these cutting edge language features? I guess it's some kind of code analysis tool, but of course, there is relatively little code in the wild out there that uses any of this stuff.

bnaudts

Hi Jon

Thanks, my tests are green again.
I really appreciate that you're following the latest Java language changes. I'd be out of my depth if I had to update Java.ccc myself.

As for the why...

Five years ago I started to develop a Java static code analyzer focused on modification and immutability: e2immu (Effective and Eventual Immutability, www.e2immu.org. It's open source, but I don't think anyone knows it exists -- I'm not good at self-promotion.)
Two years ago I founded the for-profit company CodeLaser (www.codelaser.io) with an ex-colleague. We're writing tools to help modernize and refactor Java code, and the basis of the refactoring engine is the e2immu analyzer.

Last year I switched e2immu's core parser from JavaParser.org to CongoCC, partly because it'd allow us to add more languages in the future -- most of our stack is language agnostic, and it'd be really cool to add languages such as C#.

We hope to launch our 1.0 version at the Devoxx conference in October, which is why I'm going through an extensive testing phase at the moment -- hence the nitpicking over currently underused Java features.

If all goes well we'll be depending on CongoCC for many years to come. Given that we're promoting "modern Java code" we feel obliged to use the latest and greatest of Java ourselves, which is how I bumped into these record pattern issues while running our own code base through the analyzer.

revusky

bnaudts Thanks, my tests are green again.
I really appreciate that you're following the latest Java language changes. I'd be out of my depth if I had to update Java.ccc myself.

Actually, I am pretty sure you would be able to do it. Maintaining Java.ccc is not a very hard part of this project. Really it isn't. Probably the most challenging work one has to do is on the templates side. In particular, keeping three sets of templates working for the three supported languages... just Java is hard enough, but having two more languages (that I don't know nearly as well) is another matter. And besides that, it's not like there are so many new language features usually. There is a new JDK every 6 months but sometimes there is no new language feature at all. Or typically just one or two things.

Of course, the problem is if you let things slide for a number of years, then catching up could be daunting. But as long as you don't allow yourself to get too far behind, simply keeping up with the current state of the language is not a very big time commitment.

This project has actually existed for a long time, since 2008 (it was called FreeCC originally) but was effectively abandonware from early 2009 to the end of 2019, just about 11 years. So when I picked it up again, there was over a decade of language evolution in Java to catch up on. In particular, the move from JDK 7 to 8 was quite revolutionary, since it introduced lambdas and so on. So I picked it up again at the very end of 2019 and the parser inside did not parse anything past JDK 5, I think. And it was not terribly correct even for JDK 5. There were holes. So I set myself the task of getting it up to JDK 8 level. I figured I'd get it to there initially and turn my attention to some other things and get back to it. The latest JDK was 13 at the time and maybe a month or two later I did get back to it and got the thing going up to the latest stable features in JDK 13. Since then, if there was a new JDK out with a new language feature or two, I'd put in a bit of time (not much really) to get it going again.

As for the why...

Five years ago I started to develop a Java static code analyzer focused on modification and immutability: e2immu (Effective and Eventual Immutability, www.e2immu.org. It's open source, but I don't think anyone knows it exists -- I'm not good at self-promotion.)

Yeah, I understand. I also am apparently not good at promoting my work. And the whole situation is pretty disheartening. I mean, what is the point of having some great thing if nobody knows about it?

I think one thing we should do is have something like a "powered by" page and a "testimonials page". Someone using this (YOU for example!) could write some "testimonial" saying how great this is and we could collect these and put it on a testimonials page. And a "powered by" page, that when it comes to our attention that somebody is using the tool, just to add it to the list. And maybe you could (in some appropriate point on your site) state that you are using this and provide a link to our project. So we should do at least such minimal things, no?

Two years ago I founded the for-profit company CodeLaser (www.codelaser.io) with an ex-colleague. We're writing tools to help modernize and refactor Java code, and the basis of the refactoring engine is the e2immu analyzer.

Last year I switched e2immu's core parser from JavaParser.org to CongoCC, partly because it'd allow us to add more languages in the future -- most of our stack is language agnostic, and it'd be really cool to add languages such as C#.

As far as I could ever tell, the main raison d'être of the JavaParser project is to provide a parser for the Java language. That is actually not the case for CongoCC. CongoCC provides grammars/parsers for Java, C# and Python, but it is actually incidental. Since we need those things internally, they are available for anybody else to use, but they are not the main goal of the project. Actually, I very much doubt that I have spent, let's say... 5% of my ongoing time in this project to maintaining the Java grammar. Probably even less, like 2 or 3%, but I'm not keeping track. So this JavaParser project, which is quite well known, famous.... has as its main goal to have a java parser (it's in the name) and has been around for quite a few years, while I, a single individual, maintain an up-to-date, correct Java parser just sort of incidentally, with just a tiny fraction of my ongoing time commitment to this application space!

I don't know if I say the above just to boast (okay, maybe a bit...) but there is a further point about all of this. I've eyeballed the JavaParser project a bit in the past. One notable aspect of it is that they have worked up some quite extensive Node API inside their project. You know, they build up the tree, and all the nodes of the tree implement the various interfaces or inherent from the abstract classes they have defined and so on... you know how it goes...

What I find noteworthy is that this entire API they have constructed (many many thousands of lines of code!) is (as far as I can tell) exclusively for use with their Java parser. CongoCC also maintains a similar (I suppose it's similar) Node API. Any CongoCC project generates the same Node API (though you can tweak it using INJECT and so on), all this stuff like node.descendantsOfType(MyNode.class) or the Node.Visitor and things like this. But this same Node API is automatically generated for any CongoCC parser -- for the ones that are there out-of-the-box, like Java, Python, and C#. Lua as well.... The day that we (or somebody else hopefully!) implements a parser for Typescript of PHP or whatever... it comes with the same Node API! It's just automatically there. The entire Node API that you would use with our Java parser is just generated, but with the help of a few injections, you have something that basically makes sense. But if you write a Typescript grammar, with a set of similar injections, you can generate a Node API that works for that language, so...

In fact, come to think of it, anybody who is already using CongoCC, could pretty easily just re-use the Node API for whatever purposes, like just subclass BaseNode maybe. Well, you see my point surely...

We hope to launch our 1.0 version at the Devoxx conference in October, which is why I'm going through an extensive testing phase at the moment -- hence the nitpicking over currently underused Java features.

Well, you do right to show up and "nitpick". If there is any other feature you feel could be useful, you can mention it certainly. How else would we know about it? No guarantees, of course...

If all goes well we'll be depending on CongoCC for many years to come. Given that we're promoting "modern Java code" we feel obliged to use the latest and greatest of Java ourselves, which is how I bumped into these record pattern issues while running our own code base through the analyzer.

Yeah, I understand. Eventually, this would come to our attention because one tries to run the parser over whatever codebase and eventually you hit a file or two that use this feature and then... But in the normal course of things that might not happen for a while, like a year maybe...