Ray Blaak <blaak@telus.net> wrote:
+---------------
| Marco Antoniotti <marcoxa@cs.nyu.edu> writes:
| > We miss a portable and Lispy Lex/Yacc pair.
| > Please do not refer me to Baker's paper. It does not cut it...
|
| Just roll your own recursive descent parsers. When written right, the code
| tends to be no more complicated than grammar files *and* more expressive,
| since you can easily code up special cases, error corrections, etc.
+---------------
I completely agree with this, with one small extension borrowed from
the front-ends of the BLISS compiler family (~30 yrs ago), useful in
those cases where it makes sense:
Yes, use recursive descent for the "major" or "outer" syntax -- say, in
the case of an Algol-like programming language, the control structures
and declarations -- but consider using simple operator precedence for
infix arithmetic expressions. Simple operator precedence can easily be
nested inside a recursive descent parser by assigning very high precedence
to "keywords" in the outer syntax and left-brackets/parens/etc., and by
assigning very low precedence to "stoppers" (right-brackets/parens/commas/
semicolons/EOF).
Compared to a "pure" recursive descent parser, when parsing the most
common infix arithmetic expressions (and especially with syntax like
that of C which has so many levels of precedence!) using simple operator
precedence eliminates the hordes of dummy interior parse nodes you would
otherwise end up with that do nothing but maintain the default operator
grouping priority.
[Though you don't want to use operator precedence for the *whole* parser,
either, since that has other (sometimes more serious) problems. For most
languages, the combination-style parser is easier/better than either alone.]
If this isn't clear I can dig up some sample code (though I'm afraid
it will be in Scheme, that being the most recent time I've used it)...
-Rob
p.s. One more trick from the BLISS compilers: Instead of a one-token
lookahead, they used a *four*-token "lexical window" (the main interface
between the lexer & the parser), with the lexeme (token) slots labelled
SYM, DEL, FUTSYM, & FUTDEL (where FUT == "future", that is, "next").]
Only "operators" (including keywords and delimiters) went into the *DEL
slots, and only "values" (source constants, variables, or reduced
parse-tree nodes representing values) went into the SYM slot(s). Calling
the lexer (named "RUND" -- Read Until Next Delimiter) shifted the window,
filling the FUTSYM (if it could) and FUTDEL slots from the program text
stream. (Given the BLISS syntax [very close to LL(1)], it was possible
for a SYM slot to sometimes be null [e.g., adjacent operators], but never
a DEL slot.)
Sounds complicated, but actually it's easier to *do* than describe! ;-} ;-}
-----
Rob Warnock, PP-ASEL-IA <rpw3@rpw3.org>
627 26th Avenue <URL:http://rpw3.org/>
San Mateo, CA 94403 (650)572-2607