Subject: Re: Back to character set implementation thinking
From: Erik Naggum <erik@naggum.net>
Date: Sat, 30 Mar 2002 13:12:52 GMT
Newsgroups: comp.lang.lisp
Message-ID: <3226482787784866@naggum.net>

* Brian Spilsbury
| I think that this approach separates things which do not require it.
| 
| If we view a string as a sequence rather than a vector, I believe that
| most of these problems evaporate.

  I think we have a terminological problem here.  What you call a sequence
  is not the Common Lisp concept of "sequence" since all of list, string,
  vector are sequences.  I think you mean something very close to what I
  mean by stream-string with your non-Common Lisp "sequence" concept.

| A sequence contains things which have both vector-access-characteristics
| and list-access-characteristics.

  This would also a new invention because this is currently foreign to
  Common Lisp.  What I _think_ you mean is very close to what I have tried
  to explain in (more) Common Lisp terminology.

| The problem is that sequences in CL have relatively poor iteration
| support.

  Well, there is nothing in Common Lisp that has both O(1) and O(n) access
  characteristics, and nothing in Common Lisp that has both support for
  random access and sequential access.  I propose that stream-string
  support sequential access and string remaining the random access.

| One of the more complex things that we might want to do with a string is
| to tokenise it.

  Precisely, but this is a problem that has many different kinds of
  solutions, not just one.

| (let ((last-point nil))
|   (dosequence (char point string)
|     (when (char= char #\,)
|        (if last-point
|            (collect (subseq string :start-point last-point :end-point
| point))
|            (setq last-point point)))))
| 
| for a half-baked example, to break up a string into a list of comma
| delimited strings.

  I prefer a design that has an opaque mark in a stream-string iterator,
  but this should also be in regular streams.  Extracting the string
  between mark and point (in Emacs terminology) may re-establish some
  context in the new string if it is merely a sub-stream-string, but could
  also copy characters into a string (vector).

| The key here is the ability to access a sequence from a stored point in
| the sequence, and to use these points to delimit sequence actions.

  I think the key is that you do not want the string itself to know
  anything about how it is being read sequentially, but a simple pointer
  into the string is not enough.  (C has certainly shown us the folly of
  such a design.)  Specifically, I want a stream-string ot be processed
  both with read-byte and read-char.

| Given this a string can easily have either kind of substrate - a random
| access, or linear access implementation, and this behaviour extends
| naturally to lists.

  Well, I have implemented a few processors for weird and stateful
  encodings, and I can tell you that it is not easily done.

| This also does not preclude the (expensive) random access of a
| variable-width character string, and would also tie into the lazy
| construction of sequences (whereby you might deal with a file as a
| lazy sequence, something like a lisp version of mmap).

  I think random access into a variable-width string is simply wrong, like
  using nth to do more than grab exactly one element of a list.

| Anyhow, given that variable-width-character strings would tend to be
| immutable (or perhaps extensible and truncatable) points should have few
| problems there.  I don't see any issues with points into lists either.

  Except that you generally need quite a lot of state, which a stream
  implementation would be fully able to support for you.

///
-- 
  In a fight against something, the fight has value, victory has none.
  In a fight for something, the fight is a loss, victory merely relief.