opeongo Ideally I would like to be able to print the context of where the error occurred within the token stream.
Hi Tom,
Any Node
object in CongoCC -- whether it is a nonterminal or terminal (i.e. Token) has a fairly significant API by default. Consider this snippet from the Node.java.ftl
template, which is the template used to generate the base Node
interface: https://github.com/congo-cc/congo-parser-generator/blob/main/src/templates/java/Node.java.ftl#L436-L443
That getLocation()
is just a convenience method that is generated by default. You could override it if you want different text. Or you could INJECT
your own specialized method, like:
INJECT Node :
{
public default String getCustomizedLocation() {....}
}
Any Node
object also has a nextSibling()
and previousSibling()
method in it as well. See: https://github.com/congo-cc/congo-parser-generator/blob/main/src/templates/java/Node.java.ftl#L253-L268
So, I guess what you really need to do is get familiar with the API that is generated by default and see what you can do with it. If it doesn't do exactly what you want, you can override the implementations in BaseNode.java
and Token.java
.
So this is what I am asking about. Is it possible to chain backwards and forwards through the list of tokens in order to reconstruct a portion of the input expression?
Well, any node/token can tell you its starting and ending offset. So, for example, if you want the text starting with node a
and ending with node b
, this should work:
String srcAToB = myLexer.subSequence(a.startOffset(), b.endOffset()).toString();
But I guess you really need to just start using CongoCC and then you can eyeball the generated code (or javadocs) and see what API is automatically generated and see what you can do with it.
In terms of handling parsing errors, you would need to eyeball what API is automatically generated for ParseException.java
. Here is the template used for generating the ParseException.java file: https://github.com/congo-cc/congo-parser-generator/blob/main/src/templates/java/ParseException.java.ftl
In particular, the getToken()
method should return the problematic token that the parser tripped up on. Of course, in an error message, you might need to use a preceding token/location to give a more understandable error message. But that's all kind of heuristics, as it were...
And is this information available both at parse time and later at evaluation time when the ast is evaluated?
No information is thrown away. Any nonterminal Node or Token "knows" its location and so on and there is API to get the things in the context, like the node or token immediately preceding or following and so on.
I don't know whether I should say this exactly, but I find it kind of perplexing that you are still using the old legacy JavaCC. I mean, the fact is that transitioning to CongoCC has whatever fixed cost whether you do it earlier or later and you might as well do it earlier since the cost is the same and you get the benefits immediately. Well, that is not about you specifically, Tom. It so happens that it is very hard to get people to switch. Even in situations where I've volunteered to do most of the work for them, like with that ancient Beanshell project. It's not just about how inferior and limited the legacy JavaCC is, but also, if you ask the ostensible maintainers any question about how the thing works, you'll never get an answer really. It's really just completely unsupported abandonware. Of course, they want to pretend that that is not the situation, but I'm sure you know the score...
Out of curiosity, how many lines of legacy JavaCC grammar code do you have?