palexdev Ooh, understood. The methods are called automatically with reflection, right?
Yes. See https://github.com/congo-cc/congo-parser-generator/blob/main/src/templates/java/Node.java.ftl#L844-L933
Note also that if there is more than one visit
method that could, in principle handle a node, it takes the more specific handler. So, if you have Delimiter
that subclasses Token
, then if there is both:
void visit(Token t) {...}
and:
void visit(Delimiter d) {...}
it will call the second method on a Delimiter
. But if that handler was not present it would just call the one for Token
. So it's actually a much more powerful, elegant disposition than the one you are used to, I think.
palexdev The FQN rule ends up consuming the whole foo.bar.BazClass.call string, but call should be part of Method because it is followed by the parentheses.
Oh, I see... I guess you're a bit confused about the relationship between the lexical and syntactic parts of the grammar. Let's see... in principle, these are two separate machines that don't really know anything about each other. But the lexical machine is lower level. That's what is operating first. So it is partitioning the input into tokens, that's what the "lexer" does, and the "parser" consumes those tokens and matches them to the grammar rules (or productions...). But if the lexer is already breaking up the input in that way and then the parser sees the stream of tokens that it sees... well, hopefully you see my point. Actually, the confusion between what the lexer and the parser does is probably about the most basic newbie conceptual confusion in terms of using this sort of tool. Once you clear that up, you'll have come a significant way.
So, getting back to your issue, once you've defined a lexical rule that is <IDENTIFIER>(<DOT><IDENTIFIER>)*
that rule is going to gobble as much input as it can (that's called "greedy matching"). So your foo.bar.BazClass.call
is going to be a single FQN
token and that is what your parser will see.
Almost certainly what you need is to handle this on the syntactic level, not the lexical level. Of course, the other problem with the way you have done this with the FQN
token is that you would probably need the thing to handle whitespace (and comments too) because, on occasion, somebody would want to write:
foo
.bar
.bazClass
.call(...)
Typically whitespace and comments would be possible. You could maybe specify this within a lexical rule, but it gets very very messy, I would say, and the normal way to handle this would be at the syntactic level, not the lexical level.
But aside from all that, maybe you would do well to make a careful study of the existing grammars under the examples/
directory. For example, consider the QualifiedIdentifier
production in the C# grammar here: https://github.com/congo-cc/congo-parser-generator/blob/main/examples/csharp/CSharp.ccc#L73-L75
and you could look at the spots where it is used. Or the DottedName
production in the Python grammar here: https://github.com/congo-cc/congo-parser-generator/blob/main/examples/python/Python.ccc#L195
Well, for example, if we look at the DottedName
production from the Python grammar, we see:
DottedName : <NAME> (<DOT> <NAME> =>||)* ;
The up-to-here =>||
means, by the way, that it actually scans ahead to this point when deciding whether to consume another iteration of the loop, i.e. it checks whether the next token is a <DOT> and the one after that is a <NAME>, right? The equivalent C# production does not check that in this spot, subtle difference.
But, let's suppose we also don't want to tack on the <DOT><NAME>
if the token after <NAME> is an opening parenthesis, like for a method call. We could write that as:
DottedName : <Name> (<DOT> <NAME> ENSURE ~(<LPAREN>) =>||)* ;
So we need a <DOT> and a <NAME> and also that the next token is not an opening parenthesis.
So with that production, we could write:
MethodCall : DottedName <DOT> <NAME> Args ;
and that might be more on track to doing what you want. But you see, above, we write the DottedName
production so that in foo.bar.BazClass.call(...)
it stops gobbling input after BazClass
because the next three tokens are <DOT><NAME><LPAREN>.
Anyway, of course, I don't really know exactly what you want to do but the above could be food for though, maybe put you on the right track.