As a consequence of the discussion initiated by ngx about converting a large ANTLR grammar to CongoCC, I started looking at the ANTLR grammar (for SQL) he pointed to and I started thinking about writing (maybe) a converter -- not specifically for the SQL grammar, but for ANTLR4 grammars generally, of course.
Well, I haven't written an ANTLR->Congo converter yet, but I have done the first step, which is to write a CongoCC grammar for ANTLR grammar files. See here. The parser that this generates can parse all the files in the ANTLR grammar repository.
It is actually striking how modest a project this turned out to be. The resulting grammar is a bit over 500 lines. That compares to the CongoCC grammar which is more like 1600 lines, and that is not including the fact that it INCLUDE
s the Java grammar which is as big again.
If you count the embedded Java grammar, the CongoCC grammar is something like 3000 lines, but note, of course, that the ANTLR grammar/parser has no knowledge of Java or any of the other target languages. The way that it deals with an embedded code action is simply to handle it lexically, all the characters inside the {...}
delimiters. So, basically, it just matches delimiters, which is a rather crude approach. The tree that CongoCC builds from a .ccc
grammar file actually contains the sub-trees for all the Java code actions and injections. And, aside from that, the CongoCC grammar files are just syntactically richer, what with lookahead/up-to-here and assertions and so on.
I wrote the CongoCC grammar for ANTLR using this one as a starting point. Well, again, the problem of parsing the .g4
files is solved. Soon, I'll get into the problem of outputting the resulting tree in a .ccc
format. I don't know how well I can get it to work, due to the different sematics. Also, unlike ANTLR, CongoCC does not support left recursion. Though, that said, it is a well-known problem. I may figure out how to transform the productions that use left recursion and output them in CongoCC format without the left recursion. (No promises though!)
All that said, even a crude converter tool could probably save a lot of time for people in the position of the aforementioned ngx. And, anyway, I thought to announce this here because maybe somebody is interested. After all, there are a lot of existing ANTLR grammars, so the ability to parse them and build and manipulate the AST using a CongoCC tree traversal sort of API could already be useful.
Of course, generally speaking, nobody will know about something like this unless I tell them about it, so...