Well, here I am talking to myself. Since writing that message yesterday, I did a bit of further work. Let's see...
One relatively minor point is that, in terms of the syntax of a raw code block being {%...%} or possibly {P%...%}, say, that was based on the notion that the string %}, to all intents and purpose, never occurs in any of the languages we are generating, so it can reliably mark the end of a code snippet. Well, that's not so true, it turns out, though in the case of Java specifically, it is in Javadoc comments that this sequence can occur. And, in any case, that %} can always occur in comments and literal strings. It may not be so common, but it is a definite possibility.
So, anyway, it's not so common, but it does actually occur, so my solution was to allow you to double-up on the delimiters in the case that you really need to have %} in your code snippet, so in that case you can write:
{{%%... %%}}
or:
{{J%%.... %%}}
And the doubled-up %%}} is surely so rarely occurring anywhere that we are safe on that. (If necessary, we could allow people to triple-up the delimiters!)
Now, in terms of specifying the output language in the raw code block, the letter, it did later dawn on me that it is not that strictly necessary. It could all be handled by the preprocessor. For example, a code block with the 'P' for Python in it, {P%...%} is basically a more terse equivalent to:
#if __python__
{% ... %}
#endif
In either case, the code snippet is ignored if we are not outputting Python basically. (NB. The output language is exposed to the preprocessor via a few preset symbols, like __python__, __java__, __csharp__, and now __rust__ as well.)
While our own example grammars should be polyglot, i.e. generate parsers in the various supported languages, I don't think most software development projects out there in the real world are polyglot. They typically will just care exclusively about outputting code in their language of choice. A Microsoft shop that is very centered around C# only cares about generating C#. And a Java shop only cares about outputting Java code, and so on. For the most part. There may be a few people (besides us!) interested in developing grammars that can generate source code in multiple languages, but I reckon it is rare.
I also did a first pass (it's not quite complete) of supporting raw code blocks with ASSERT/ENSURE and also FAIL.
You can now write:
ENSURE {% someCondition() %}
The handling of ASSERT is somewhat unfinished. On the Java side, I recently (about a year ago maybe) enhanced the syntax so that you can write:
ASSERT '{' javaExpression ["," location] [":" errorMessage] '}'
You can see an example of this here. Actually, I'll paste in the code:
ASSERT {
permissibleModifiers == null || hasMatch(permissibleModifiers,lastConsumedToken),
lastConsumedToken : "Modifier " + lastConsumedToken + " not permitted here."
}
You see, the Java expression it is asserting is true is permissibleModifiers == null || hasMatch(permissibleModifiers,lastConsumedToken) which means that, at least if permissibleModifiers (an EnumSet defining the set of TokenTypes permissible at this stage) is defined, then the modifier we just saw has to be one of those. This allows us to exclude nonsense like public private or final abstract, modifiers combinations that just make no sense! But we express this condition and then there is a comma, and what follows the comma, which is lastConsumedToken is a Node variable that is the location that is used to construct the resulting error message. And then what is after the : is the actual error message we construct.
Well, the above is not currently implemented for raw code snippets. At the moment, all you can write is:
ASSERT {% condition expressed in whatever language %}
But the location and message after , and : respectively is not implemented. (It will be soon.)
In the case of FAIL, I realized that this is a tricky case. The way it works now is that you can write:
Option1
|
Option2
|
FAIL "some message"
But actually, where you have "some message" can be any Java expression. However, you can also have:
FAIL {some java code}
This is an interesting point, because there is a quite important difference between:
(Option1 | Option2 | FAIL {throw some exception})
and:
(Option1 | Option2 | {throw some exception}
The difference relates to when you are in a lookahead. (If you are actually parsing, it's all the same.) If you are in a lookahead, and you scan ahead to {throw some exception} the lookahead is taken to have succeeded. That is because hitting any Java code block (which is really just a black box from the point of view of the parser generator system) is taken to be a success. But if that Java code block has FAIL preceding it, it means that if we scan ahead to this point, the lookahead did not succeed!
So the FAIL statement is an absolutely necessary element, if you think about it, to express certain things...
But anyway, as things stand, we have a bit of a problem with FAIL in association with raw code blocks because we have two sorts of FAIL statements, one which specifies an error message, and the other that specifies a code block. (If one wants to pick nits, there is also a third type (just FAIL alone) that specifies neither of the above!)
But once we allow:
FAIL {% some code %}
how do we know that the code inside the {% ... %} raw code block is an expression (to be used to construct an error message) or is actually a code block with one or more statements to run. (A weird twist on this is that the above distinction, between expressions and statements barely exists in Rust. Practically everything that we consider a statement in Java or other languages, is also an expression in Rust. So...)
Well, anyway, the solution I have found (it's as good as I can come up with) is that if the code block is meant to be code to be run, then we write:
FAIL => {% code block %}
and if it is just an expression to construct an error message, then there is no arrow.
Of course there is no ambiguity in the case of the existing Java-centric way it is defined. When you write:
FAIL { java code block }
or:
FAIL "you dummy!"
there is no ambiguity. But... you can (now) optionally write:
FAIL => {java code block}
so that it is consistent with how the raw code block works.
So, anyway, there may be some glitches in all this but it's basically all implemented. (Well, except for specifying the location and error message in an ASSERT, which is still unimplemented, but will be quite soon.)
So, anyway, that's the state of the world at the moment. All comments and ideas are welcome. Thanks for reading to this point!