Well, in principle, I think it is useful. In this vein, by the way, you might (if you haven't already) look at how the Modifiers
production in the Java grammar works. See here: https://github.com/congo-cc/congo-parser-generator/blob/main/examples/java/Java.ccc#LL155
That's actually a somewhat more complex case than just a set, because there are rules about which modifiers can appear together. For example, if you already have the modifier public
somewhere, it can't occur again obviously, but also certain combinations of modifiers are not permissible, like you can't have public private
and you can't have abstract final
and so on.
But, the notion of a choice in which an option can occur at most one time, that does seem like something useful. And I guess it's not that hard to implement either.
There are some subtle issues, however, in terms of lookahead vs. parsing. Suppose somebody does write (in Java) something like:
public static public void foo() {....}
Well, the second public
is erroneous obviously, but it seems to me that, practically speaking, it is usually better if your lookahead is forgiving of the error, but then it is caught when you actually parse the construct. Or, in other words, your predicate for entering the MethodDeclaration
production is deliberately looser than the actual parsing. The advantage of that is that you recognize that it is a method declaration (albeit with the erroneous extra public
modifier) so you go into that production and then hit the error there. (And, if the fault-tolerant machinery is on, it should be able to skip the extraneous modifier and keep going.)
But, you see, if your lookahead is as strict as the parsing, then it rejects MethodDeclaration and tries the next choice and the next choice and the result of that is that you're liable to get some incomprehensible error message. And also (perhaps more importantly) in a fault-tolerant mode, in which you keep parsing after an error, you want it to recognize that it is a MethodDeclaration even if there is the error that a modifier is repeated.
You see my point?
Of course, the other option is just to have a parser that is very forgiving of these things and then do a subsequent tree-walk that finds these problems. But that just punts on the problem really, I mean insofar as specifying these things inside the grammar itself. At some point, I became moderately obsessed with the whole problem of being able to specify these things in the grammar, which is basically what this is about: https://parsers.org/announcements/reference-java-grammar/
One might think about an alternative operator (I think I was thinking along the lines of the backslash though you were considering simply doubling the |
operator to mean that the choice that follows can only occur once. And that could actually be the first choice, so (\ A \B \C)* would be mean zero or more of A, B, or C, but each one can occur exactly once. But that syntax is just off the top of my head really.
Well, it's all doable, but there is this matter that, arguably, we should be more disciplined about adding more features when we haven't really sufficiently documented the ones we have. So there is that... 😆