Here is a draft of documentation for one of the capabilities added recently:
Using CongoCC to Generate JTB-style Trees
For JavaCC, there are actually two different ways to generate a tree-structured representation of the parsed input text. These are preprocessor-based jjTree tool and the Java Tree Builder (or JTB, for short) tool. While CongoCC includes the built-in capability to produce a tree that is backward-compatible with those produced by jjTree, it has, until now, left applications of JavaCC that rely on JTB-generated tree structures behind. The good news is that this is no longer the case.
A new capability exist in CongoCC that causes its built-in tree building capability to additionally generate tree nodes corresponding to those generated by the JTB tool. Furthermore, the powerful reflection-based visitation supported by CongoCC tree nodes eliminates the need for the JTB-generated visitors so that only the resulting CongoCC tree nodes are needed to support JTB-style visitation with complete fidelity.
If this piques your interest, here is how to do it.
Converting your JavaCC/JTB grammar to CongoCC
The first step is to render your JavaCC grammar file in a form that is acceptable to CongoCC. At one point CongoCC maintained source compatibility with JavaCC, but it became clear that continuing to do so actually detracted from the clarity of the resulting CongoCC grammar when CongoCC's extensions were added. The solution was to freeze the syntax of the source-compatible version of CongoCC (known as JavaCC) and add a rewriting capability to produce a conversion tool that, for most JavaCC grammars (including ones that are preprocessed by JTB), running the conversion tool results in a correct CongoCC version of the input that requires few, if any, changes by hand to be completely equivalent to the JavaCC one.
Information on running this conversion utility can be found here [TBA].
Adding OPTIONS to the CongoCC grammar to enable generation of JTB tree nodes
In order to produce a tree structure that is equivalent to the JTB tree expected by your visitors a few CongoCC options should be set at the beginning of the parser source file. These are:
SMART_NODE_CREATION=false;
X_SYNTHETIC_NODES_ENABLED;
X_JTB_PARSE_TREE;
The first turns off the default CongoCC feature that suppresses the creation of a node if that node would contain either no child nodes or only a single child node. For JTB, all of the nodes implied by the grammar must be present for the JTB-style visitation and navigation of the resulting tree.
The second turns on the (presently experimental) feature that causes every syntactic expression (or sub-expression) to potentially generate a (syntactic) node.
The third turns on the feature that activates the generation of syntactic nodes at the locations that JTB generates additional nodes, and enables the definition of fields in non-terminal definition (i.e., production) nodes correponding to the top-level syntactic structure of the production.
This is almost enough to generate JTB-compatible trees! What is left is to define nonterminals in the grammar that correspond to JTB's synthetic nodes, and have them include the methods that are available to visitors when JavaCC/JTB is used.
Including the JTB productions for the synthetic CongoCC nodes
Fortunately, CongoCC has two features that JavaCC doesn't have, but that make this step very easy.
First, CongoCC has the INJECT feature, which allows code to be injected into generated nodes. In this case, code must be injected into the CongoCC synthetic nodes that will cause them to extend
the JTB syntactic nodes. This CongoCC source code does that:
/*******************************************************
* The following should be moved to the CongoCC parser.*
*******************************************************/
Terminal : FAIL ;
Sequence : FAIL ;
Choice : FAIL ;
ZeroOrOne : FAIL ;
ZeroOrMore : FAIL ;
OneOrMore : FAIL ;
INJECT Terminal :
extends NodeToken;
{}
INJECT Sequence :
extends NodeSequence;
{}
INJECT Choice :
extends NodeChoice;
{}
INJECT ZeroOrOne :
extends NodeOptional;
{}
INJECT ZeroOrMore :
extends NodeListOptional;
{}
INJECT OneOrMore :
extends NodeList;
{}
If you are using JTB now, you will note the nodes receiving the injections are extending classes for every JTB syntactic node. This is because the CongoCC compiler will generate synthetic nodes corresponding to all of the JTB nodes, but they are named differently. By extending them, we are naming them in a manner compatible with JTB, and we are providing classes that can provide the methods required to achieve full[1] visitor compatibility with JTB as well.
We add the methods as such:
INJECT NodeToken :
import java.util.*;
{
Token token = Token.newToken(INVALID, null, 0, 0);
@Override
public void close() {
List<Token> tokens = (List<Token>) getAllTokens(false);
if (tokens.size() > 0) {
token = tokens.get(0);
}
}
public Token getToken() {
return token;
}
@Override
public String toString() {
return token.toString();
}
@Deprecated
@Override
public String getImage() {
return token.toString();
}
@Override
public String toCompleteString() {
return token.toCompleteString();
}
@Override
public int getLine() {
return token.getLine();
}
@Override
public int getLastLine() {
return token.getLastLine();
}
@Override
public int getColumn() {
return token.getColumn();
}
@Override
public int getLastColumn() {
return token.getLastColumn();
}
@Override
public boolean isEOF() {
return token.getType() == Token.TokenType.EOF;
}
public NodeToken() {
super();
}
@Override
public int hashCode() {
return Objects.hash(token.getBeginOffset(), token.getEndOffset(), token.toString(), token.getParent(), token.getSource(), token.getTokenSource(), token.getType());
}
@Override
public boolean equals(Object obj) {
if (this == obj) return true;
if (obj == null) return false;
if (getClass() != obj.getClass()) return false;
NodeToken other = (NodeToken) obj;
return getBeginOffset() == other.getBeginOffset() && getEndOffset() == other.getEndOffset() && Objects.equals(toString(), other.toString()) && Objects.equals(getParent(), other.getParent()) && Objects.equals(getSource(), other.getSource()) && Objects.equals(getTokenSource(), other.getTokenSource()) && getType() == other.getType() && isUnparsed() == other.isUnparsed();
}
}
NodeToken# :
FAIL
;
INJECT NodeSequence :
import java.util.*;
{
public List<? extends Node> nodes;
public Node elementAt(int i) {
return get(i);
}
public Iterator<? extends Node> elements() {
return nodes.iterator();
}
@Override
public void close() {
nodes = children();
super.close();
}
}
NodeSequence# :
FAIL
;
INJECT NodeChoice :
import java.util.*;
{
/** The "which" choice indicator */
public int which;
public Node choice;
public boolean isValid = true;
public int which() {
return which;
}
public Node choice() {
return choice;
}
public void setChoice(int which) {
this.which = which;
}
@Override
public void close() {
// initialize the NodeChoice fields
try {
choice = get(0);
} catch (Exception e) {
isValid = false;
}
super.close();
}
}
NodeChoice# :
FAIL
;
INJECT NodeList :
import java.util.*;
{
public List<? extends Node> nodes;
public Node elementAt(int i) {
return get(i);
}
public Iterator<? extends Node> elements() {
return nodes.iterator();
}
@Override
public void close() {
nodes = children();
super.close();
}
}
NodeList# :
FAIL
;
INJECT NodeListOptional :
import java.util.*;
{
public List<? extends Node> nodes;
public Node elementAt(int i) {
return get(i);
}
public Iterator<? extends Node> elements() {
return nodes.iterator();
}
/**
* @return true if there is at least one node, false otherwise
*/
public boolean present() {
return size() > 0;
}
@Override
public void close() {
nodes = children();
super.close();
}
}
NodeListOptional# :
FAIL
;
INJECT NodeOptional :
import java.util.*;
{
public Node node;
/**
* Gets the node in the list at a given position.
*
* @param i - the node's position
* @return the node
*/
public Node node() {
return node;
}
/**
* @return true if child node exists; false otherwise
*/
public boolean present() {
return node != null;
}
@Override
public void close() {
if (size() > 0) {
node = get(0);
} else {
node = null;
}
super.close();
}
}
NodeOptional# :
FAIL
;
The above adds the definition of the new JTB syntactic nodes and add to them methods necessary to achive the same visitor support as provided by the JTB generated equivalents.
At this point, everything necessary to enable the generation by CongoCC of a tree that can be visited by code that expects a JTB tree is complete, but the visitor must undergo two very slight one-time modifications. The are easily accomplish with you favorite editor.
Changing the visitor(s) to work with the CongoCC JTB node trees
First, the visitor must extend the Node.Visitor
class defined by CongoCC. This is accomplished like so.
In your visitor, you begin with something like:
public class MyJtbVisitor extends DepthFirstVoidVisitor {
.
This must be change to something like:
public class MyJtbVisitor extends Node.Visitor {
.
Second, within your visitor you will have methods like:
@Override
public void visit(ANonTerminal n) {
...
}
You must remove the @Override
annotation if it is present (unlike JTB, you will not be overriding a specific method signature). Within those methods you might have super.visit(n);
. If you do, you must change them to recurse(n);
.
Finally, within the vistor methods as above you will likely have n.accept(this);
. n
could be an expression that resolves to a node as well. If so, you have two choices. Either change the accept(this)
to visit(n)
, or include the following in your parser source:
INJECT Node :
{
/**
* Accepts a visitor to this {@code Node}.
* @param visitor is a {@link Node.Visitor} to this node
*/
default public void accept(Visitor visitor) {
visitor.visit(this);
}
}
At this point, it is recommended that you put the preceding in a separate file and use the second handy feature of CongoCC, the INCLUDE directive, to include it in your parser source.
That's all folks
That's it. You should have a parser source that can be compiled by CongoCC into Java that includes a tree of nodes that can be visited by your visitors with no other changes.
[1]: Well, not really "full". This provides support for using what JTB calls a DepthFirstVoidVisitor
. That is, by far, the most common visitor pattern used, and is perhaps the only one.