Subject: Re: Back to character set implementation thinking From: Erik Naggum <erik@naggum.net> Date: Sat, 30 Mar 2002 13:12:52 GMT Newsgroups: comp.lang.lisp Message-ID: <3226482787784866@naggum.net> * Brian Spilsbury | I think that this approach separates things which do not require it. | | If we view a string as a sequence rather than a vector, I believe that | most of these problems evaporate. I think we have a terminological problem here. What you call a sequence is not the Common Lisp concept of "sequence" since all of list, string, vector are sequences. I think you mean something very close to what I mean by stream-string with your non-Common Lisp "sequence" concept. | A sequence contains things which have both vector-access-characteristics | and list-access-characteristics. This would also a new invention because this is currently foreign to Common Lisp. What I _think_ you mean is very close to what I have tried to explain in (more) Common Lisp terminology. | The problem is that sequences in CL have relatively poor iteration | support. Well, there is nothing in Common Lisp that has both O(1) and O(n) access characteristics, and nothing in Common Lisp that has both support for random access and sequential access. I propose that stream-string support sequential access and string remaining the random access. | One of the more complex things that we might want to do with a string is | to tokenise it. Precisely, but this is a problem that has many different kinds of solutions, not just one. | (let ((last-point nil)) | (dosequence (char point string) | (when (char= char #\,) | (if last-point | (collect (subseq string :start-point last-point :end-point | point)) | (setq last-point point))))) | | for a half-baked example, to break up a string into a list of comma | delimited strings. I prefer a design that has an opaque mark in a stream-string iterator, but this should also be in regular streams. Extracting the string between mark and point (in Emacs terminology) may re-establish some context in the new string if it is merely a sub-stream-string, but could also copy characters into a string (vector). | The key here is the ability to access a sequence from a stored point in | the sequence, and to use these points to delimit sequence actions. I think the key is that you do not want the string itself to know anything about how it is being read sequentially, but a simple pointer into the string is not enough. (C has certainly shown us the folly of such a design.) Specifically, I want a stream-string ot be processed both with read-byte and read-char. | Given this a string can easily have either kind of substrate - a random | access, or linear access implementation, and this behaviour extends | naturally to lists. Well, I have implemented a few processors for weird and stateful encodings, and I can tell you that it is not easily done. | This also does not preclude the (expensive) random access of a | variable-width character string, and would also tie into the lazy | construction of sequences (whereby you might deal with a file as a | lazy sequence, something like a lisp version of mmap). I think random access into a variable-width string is simply wrong, like using nth to do more than grab exactly one element of a list. | Anyhow, given that variable-width-character strings would tend to be | immutable (or perhaps extensible and truncatable) points should have few | problems there. I don't see any issues with points into lists either. Except that you generally need quite a lot of state, which a stream implementation would be fully able to support for you. /// -- In a fight against something, the fight has value, victory has none. In a fight for something, the fight is a loss, victory merely relief.