Another one from my Cabinet of Curiosities ...
In the grammar I have occasion to consume tokens up to a specific end token without regard to any expansion. For example something like this:
SelectStatement : <ESQL_SELECT> =>|| {consumeSqlFragment(0,END_EXEC);}# | FAIL
[BTW, the | FAIL
is to cause this to be a choice point so as to always perform the Java action in lookahead in the context of a higher-level up-to-here]
What it tries to do is, at lookahead time, scan the ESQL_SELECT token and then execute the consumeSqlFragment (...) method, which is coded to scan tokens in lookahead and consume them in regular parsing. In this case it starts at the ESQL_SELECT token and scans subsequent tokens until it gets to an END_EXEC token (which it does not scan). That all seems to work fine. The wrinkle is that the END_EXEC token specifies a state change to DEFAULT (from SQL_STATE). Apparently, from looking at the NFA code, merely peeking at the END_EXEC token causes the state change to occur. Then, when actually consuming the tokens after the lookahead has succeeded, the parser fails to find the ESQL_SELECT token (because the state has been changed by the lookahead peeking at the END_EXEC) and finds the (DEFAULT state) SELECT token instead. This causes higher-level production to fail.
It seems like the lexer/parser should effect the actual state change only when the token that changes it is scanned or consumed, or should I be doing something similar to what the stashParseState/popParseState does in ATTEMPT/RECOVER to reset the previous lexer state after I do a getNextToken() that isn't followed by a scan or consume?