Subject: Re: READ-DELIMITED-FORM
From: Erik Naggum <erik@naggum.no>
Date: 05 Sep 2002 12:43:22 +0000
Newsgroups: comp.lang.lisp
Message-ID: <3240218602163684@naggum.no>

* Tim Bradshaw
| Can you explain why?  

  Because the reader algorithm is defined in terms of tokens that are examined
  before they are turned into integers, floating-point numbers, or symbols.
  The tokens ., .., and ... must all be interpreted (or cause errors) prior to
  being turned into symbols, and if you expect to be able to look at them
  after `read´ has already returned, the original information is lost and you
  will have insurmountable problems reconstructing the original characters
  that made up the token, just like you cannot recover the case information
  from a token that turned into an integer or symbol.  The hard-wired nature
  of ) likewise has to be determined prior to processing it as a terminating
  macro characters.

  The usual way to implement the tokenization phase of the reader is to work
  with a special buffer-related substring or mirrored buffer that characters
  are copied into and then to use special knowledge of this buffer in the token
  interpretation phase.  The way I implement tokenizers and scanners is with
  an offset from the current stream head to peek multiple characters into the
  stream.  When the terminating condition has been found, I know how many
  characters to copy, if needed, and I am relatively well-informed of what I
  have just scanned.  When the token has been completed, I let the stream head
  jump forward to the point where I want the next call to start.  This may be
  several characters shorter than I scanned ahead, naturally.  I invented this
  technique to parse SGML, which would otherwise have required multiple-
  character read-ahead or some buffer on the side and much overhead.

-- 
Erik Naggum, Oslo, Norway

Act from reason, and failure makes you rethink and study harder.
Act from faith, and failure makes you blame someone and push harder.