Subject: Re: parsing lisp expression From: Erik Naggum <clerik@naggum.no> Date: 1997/12/21 Newsgroups: comp.lang.lisp Message-ID: <3091670152628940@naggum.no> * Tim Bradshaw | Doing the reverse shouldn't be much harder I think (you might have toi | hack readtables), so I see no need for perl. this is not quite as easy as it seems. first, whether a read object is the first element of a list or is itself, is only defined by the _following_ character, which is at odds with how the Lisp reader works. second, the end of the input must be defined externally to the syntax for lists: once we have read a token, we cannot return until the final token is either a non-list (maybe a semicolon or a period?) or the end of the input (maybe end of line?). third, while the singleton list is x[], there is no concept of an empty list in that syntax, unless such it served by some other special value (such as `nil'). all this means that reading that kind of list is a quite different process from the normal `read-delimited-list'. so I threw together this to demonstrate that it is possible to write a moderately compact parser for a "botched syntax", using the Lisp reader for the real work: (defun read-botched-syntax (&optional stream (eof-error-p t) (eof-value nil)) (labels ((read-botched-list () (loop initially (read-char stream) ;discard #\[ until (eq (peek-char t stream) #\]) collect (read-botched-internal) finally (read-char stream))) ;discard #\] (read-botched-internal () (loop with first = (read stream) for look-ahead = (peek-char t stream nil nil) while look-ahead while (eql look-ahead #\[) do (setq first (cons first (read-botched-list))) finally (return first)))) (let ((*readtable* (copy-readtable))) ;; make #\[ and #\] terminate tokens. #'identity is never called. (set-macro-character #\[ #'identity nil) (set-macro-character #\] #'identity nil) (if (null (peek-char t stream eof-error-p nil)) eof-value (read-botched-internal))))) this will do lots of uninspiring things if much more than simple tokens are present in the input, so a possible replacement that tries to limit itself to tokens would go like this: (read-botched-internal () (let* ((char (peek-char t stream)) (macro-function (get-macro-character char))) ;; heuristically determine whether this will be read as a token ;; this works in CMUCL and Allegro CL for Unix, not in CLISP (if (or (null macro-function) ;this handles #\\ (eq macro-function (get-macro-character #\A))) (loop with first = (read stream) for look-ahead = (peek-char t stream nil nil) while look-ahead while (eql look-ahead #\[) do (setq first (cons first (read-botched-list))) finally (return first)) (error 'reader-error :stream stream :format-control "~@<Syntax error in ~S (character ~@C).~:@>" :format-arguments (list stream char)))))) a more complete approach would be replacing the reader macro function for all (relevant) characters with one's own token-reader, but that's just too much work for now. #\Erik -- If you think this year is number 97, | Help fight MULE in GNU Emacs 20! _you_ are not "Year 2000 Compliant". | http://sourcery.naggum.no/emacs/