How does the JSON example work?

opeongo

Hi Jonathan,
Can you explain to me where the Values object comes from in the JSON example. I see it here at line 74.

Obviously it is collecting the list of values, but where is the Values class defined, and how does the grammar know to insert all of the elements that are collected in to this class?

opeongo

opeongo Replying to my own question, after building the example project with congocc I see that the #Values token on line 74 both declared the class and assigned a new instance to thisProduction.values. Very concise and clever.

And inserting the Value elements in to the array was left as an exercise for the reader.

revusky

opeongo Replying to my own question, after building the example project with congocc I see that the #Values token on line 74 both declared the class and assigned a new instance to thisProduction.values. Very concise and clever.

Yes, it is, but the credit for that should go to John Bradley (whose login on this forum is adMartem). He added that feature just over a month ago, so it is very very new, and it's not actually documented anywhere. You'd have to read the code to know that you can do that!

Though, actually, a lot of things aren't documented anywhere, I have to admit. A feature that has been in the code for a good, long time, but I've never really documented it is that you can (quite similar to this) put a left-hand-side on a lookahead (in CongoCC, that's SCAN, not LOOKAHEAD). So you can write:

  result = SCAN Foo Bar => Foobar

So, I mean, you can store the result of the lookahead in a variable. result in this case, which would have to be defined somewhere or other, of course. And, actually, if all you wanted was to know whether the upcoming input matches Foo followed by Bar and store it in a variable, you could do:

    [ result = SCAN Foo Bar => {} ]

In that case, you don't consume any input whether the lookahead succeeds or not. The only purpose of the above would be to set the result variable, which you'd presumably use later somehow or other. Of course, now that I think about it, the above is the exact same as:

   {result = false;}
    [ SCAN Foo Bar => {result = true;} ]

And inserting the Value elements in to the array was left as an exercise for the reader.

Yes, well, regardless of the left-hand-side (which is a very new feature) if you annotate an expansion as building a node, then any new nodes created when building that node become child nodes of the node you just built. The way the Array production was originally written was:

Array :
   <OPEN_BRACKET>
   [
       Value  (<COMMA> Value)*!
   ]
   <CLOSE_BRACKET>
;

But, do note that the tree-building strategy is very bloody-minded, so the generated parse tree contains all the open/close bracket and commas inside, and very typically you don't want that. One solution might be:

  INJECT Array:
  {
        public java.util.List<Node> getUsefulChildren() {
              return childrenOfType(Node.class, n->n.getType() != OPEN_BRACKET && n.getType() != CLOSE_BRACKET && n.getType() != COMMA);
        }
  }

I think there is a bit of a conceptual shift when going from legacy JavaCC to Congo. For one thing, the parse tree is generated by default, so that is assumed to be the most typical usage pattern. And then typical solution to whatever it is is very typically something like the above, you just inject the method you need into the appropriate object in the parse tree. In typical usage, you would have no more reason to look at the generated Array.java than there is to eyeball a .class file. That's the idea anyway.

Well, I just thought to make a few points. Glad to see you here, Tom.

adMartem

Or now you could say:

<OPEN_BRACKET>
[
    @usefulChildren :+= Value  (<COMMA> @usefulChildren += Value)*!
]
<CLOSE_BRACKET>

stbischof

I try to do a parser on top of your json example that can parse

OSGi Feature files.
https://docs.osgi.org/specification/osgi.cmpn/8.0.0/service.feature.html#d0e156633

{
  "feature-resource-version": "1.0",
  "id": "org.acme:acmeapp:1.0.1",
    
  "name": "The Acme Application",
  "license": "https://opensource.org/licenses/Apache-2.0",
  "complete": true,

  "bundles": [
    { "id": "org.osgi:org.osgi.util.function:1.1.0" },
    { "id": "org.osgi:org.osgi.util.promise:1.1.1" },
    {
      "id": "org.apache.commons:commons-email:1.5",

      // This attribute is used by custom tooling to 
      // find the associated javadoc
      "org.acme.javadoc.link": 
        "https://commons.apache.org/proper/commons-email/javadocs/api-1.5"
    },
    { "id": "com.acme:acmelib:1.7.2" }      
  ]
   

}

The set of attributes ("id", "name", "license") is defines. Some Must exist some can exist - once.
But can be any order, i am not sure how to express this.
Does anyone Have a helping hand?

revusky

stbischof Hi Stefan.

The truth of the matter is that there is no way in the core BNF grammar to specify that an option in a repeated choice can only occur once. There are ad hoc ways of introducing the constraint, but... Anyway, there is a PR in the queue from John Bradley that is meant to address this.

I guess the problem that choices cannot be repeated exists quite frequently, but for whatever reasons, parser generators usually don't address the problem. For example, the problem of modifiers in Java. You can't write public static public or something like that. Any modifier can only occur once. So, you can see how the Java grammar inside of Congo deals with this here: https://github.com/congo-cc/congo-parser-generator/blob/main/examples/java/Java.ccc#L146-L207 which is a case of me getting very nitpicking about defining this absolutely correctly. It's actually not solely that modifiers can't be repeated, but also some combinations are illegal, like public private or abstract final and so on. So you can see how I went about doing that.

A funny thing about that is that if you look at the ANTLR grammar repostory, the various contributed Java grammars don't addess this problem at all. For example, if you look here: https://github.com/antlr/grammars-v4/blob/master/java/java20/Java20Parser.g4#L364-L380 that grammar does nothing about the modifiers being repeated or in illegal/nonsensical combinations or whatever. Of course, it could be argued that one might as well just accept the various input and build a tree and then you can have some sanity checking code walks the tree after parsing and flags the elements that have illegal combinations. So, basically, your parser just accepts various illegal input and then you just deal with it with imperative code separately. And, actually, that could be a practical solution depending on what you're doing, like if your parser is just part of an IDE or smart editor and you are accepting of incorrect input and just mark it in a separate pass. But, the way I handled the Modifiers in the Java grammar was mostly just because I set myself the task of writing a maximally correct grammar. So that's what I did.

Well, the modifiers in Java is a bigger problem than not repeating elements, since there are illegal combinations. In terms of just an ad hoc trick so that something is not repeated, you could have:

{boolean seenA=false, seenB=false;}
(
   SCAN {!seenA} => A {seenA=true;}
   |
   SCAN {!seenB} => B {seenB=true;}
   |
  C
)*

A bit verbose, but that would disallow more than one occurrence of either A or B and as many C's as you want.

adMartem

Yes! this is exactly what the PR is addressing.