Hello, everyone.
I just created a new format for python code that I anticipate we will generate from templates. I dub this the "pywim" format, where "wim" stands for With Indentation Markers. The "wim" part also is like the first syllable of "whimsical", but actually, this is not whimsical at all really.
The basic problem is that what distinguishes Python's syntax, that indentation (and dedentation) is actually meaningful, makes it very very hard to deal with the templating problem, i.e. generating valid Python code from the templates. In a "normal" language like Java, the solution is simple: your template can generate code that is any which a way in terms of indentation, and then, in a separate pass, you just parse that code (since it is valid and thus parseable after all) and you walk the AST and generate properly formatted code. That does not work for Python because our Python parser (like any Python parser) will refuse to parse code that is not properly indented!
So, you see, the problem is that if one is working on a template, one has a natural tendency to indent it so that it is readable as a template, but that is frequently not compatible with the template generating properly indented Python source code. So, the solution I have in mind is pywim, which is an alternative representation for python that works pretty much like any "normal" (LOL) language -- "normal" in the sense that indentation is not meaningful. Actually, in most programming languages, both horizontal whitespace and newlines are not meaningful. In pywim, horizontal whitespace is ignored, but newlines actually work like they do in regular python. In regular python (and pywim) a newline that ends a code line is meaningful, while any newlines that just create superfluous vertical space are not meaningful.
But, anyway, the important point is that in pywim, you have explicit indent/dedent tokens which are >>>
and <<<
and need to be there to indicate the indentation.
So here is an example to give you a sense of what I'm talking about:
# Some pywim
def check_intervals(ranges, ch):
>>> index = bisect.bisect_left(ranges, ch)
# The following are not indented properly but
# it doesn't matter! The parser takes its cue from
# the indentation _markers_, not the actual indentation
n = len(ranges)
if index < n:
>>> if index % 2 == 0:
>>> if index < (n - 1):
>>> return ranges[index] <= ch <= ranges[index + 1]
<<< <<<
elif index > 0:
>>> return ranges[index - 1] <= ch <= ranges[index]
<<< return False
<<<<<<
What I anticipate in short order is that a template such as lexer.py.ftl
(which now lexer.py.ctl
actually) will be lexer.pywim.ctl
and it will generate pywim code. And then we can parse that into an AST and do stuff we want to do, like reap unused variables and things like that. And then the final thing is to spit out the actual python. We just walk the AST and generate standard python code.
As things stand now, the Java test harness for the Python parser parses an input file as pywim if the extension is .pywim
. (For some reason, the non-Java tests are broken now, and I honestly don't know why. None of the aforementioned changes should break anything, but maybe somebody will look into this. (Maybe somebody whose initials are VS.))
And the result of this operation will be that one will be able to work on the python templates without this constant fear that adding or removing a (seemingly) extraneous space (or tab) will break the template!