Subject: Re: READ-DELIMITED-FORM From: Erik Naggum <erik@naggum.no> Date: 05 Sep 2002 12:43:22 +0000 Newsgroups: comp.lang.lisp Message-ID: <3240218602163684@naggum.no> * Tim Bradshaw | Can you explain why? Because the reader algorithm is defined in terms of tokens that are examined before they are turned into integers, floating-point numbers, or symbols. The tokens ., .., and ... must all be interpreted (or cause errors) prior to being turned into symbols, and if you expect to be able to look at them after `read´ has already returned, the original information is lost and you will have insurmountable problems reconstructing the original characters that made up the token, just like you cannot recover the case information from a token that turned into an integer or symbol. The hard-wired nature of ) likewise has to be determined prior to processing it as a terminating macro characters. The usual way to implement the tokenization phase of the reader is to work with a special buffer-related substring or mirrored buffer that characters are copied into and then to use special knowledge of this buffer in the token interpretation phase. The way I implement tokenizers and scanners is with an offset from the current stream head to peek multiple characters into the stream. When the terminating condition has been found, I know how many characters to copy, if needed, and I am relatively well-informed of what I have just scanned. When the token has been completed, I let the stream head jump forward to the point where I want the next call to start. This may be several characters shorter than I scanned ahead, naturally. I invented this technique to parse SGML, which would otherwise have required multiple- character read-ahead or some buffer on the side and much overhead. -- Erik Naggum, Oslo, Norway Act from reason, and failure makes you rethink and study harder. Act from faith, and failure makes you blame someone and push harder.