Subject: Re: Back to character set implementation thinking
From: Erik Naggum <erik@naggum.net>
Date: Sun, 31 Mar 2002 02:59:34 GMT
Newsgroups: comp.lang.lisp
Message-ID: <3226532389569746@naggum.net>

* Brian Spilsbury
| A string cannot use non-vector substrate in CL, if it were
| fundamentally a sequence, they it could, as long as that substrate
| satisfied sequence.

  As I said, we have a terminological problem here.  vector and list are
  disjoint subclasses of sequence.  string is a subclass of vector.

| from memory vectors are not necessarily O(1) random access in CL,

  This might be at the core of your confusion.

| When I say sequence, I mean the type-definition, rather than a particular
| data-type.

  I know Common Lisp too well to understand what you mean.

| Lists have support for random access implemented via sequential
| accessors.  Vectors have support for linear access implemented via random
| accessors.

  No, this is really fundamentally confused.  Random access _means_ O(1).
  Linear access means that you have a first-class pointer to each element,
  required to access the next.  Both the cons cell and the stream satisfy
  the latter.

| The real problem is that sequence doesn't define any iterative operators,
| only cons [as list] does via cdr/rest and dolist, and the ad-hoc support
| via loop.

  What is "ad-hoc" about it?  This is very puzzling.

| I do not think that limiting yourself to a single mark/point pair, nor
| keeping a mark/point in the container, where any modification propagates
| side-effects, is a particularly good strategy for lisp.

  I think you should read what I write a little better.  It is vital that
  mark and point are _not_ part of the string, but of the iterator.  I have
  said as much.  Please do not rudely ask me to waste my time to refute
  conclusions based on things I have not said.

| I think it is relatively straightforward, in some encodings the amount
| of state might be annoyingly large, though.

  Well, we just appear to have different tolerance of necessities, or you
  know some encodings I do not, which I kind of doubt.  An example of a
  stateful encoding with an annoyingly large amount of state would be
  useful so I know where the amount becomes annoyingly large.

| In the standard compression scheme for unicode you need to save
| Single-Byte-Mode-P, Current-Window, and the 8 Dynamic-Window-Offsets, and
| Locking-Shift-P, I've only glanced over the spec, so please excuse
| omission or error.

  Seems pretty accurate.

| The unicode SCS is pretty heavy on state, I'll agree, that's 11 words
| in the most conversative form, although there are various
| optimisations you could apply, I might expect to represent that in 5
| 32-bit words with packing.

  This is so heavy on state you want to optimize the storage?  My good man,
  this is nothing and not worth optimizing.

| The other advantage is that we don't need to store the state in the
| string at all, the transitory state is kept in the iterator (ie,
| dosequence, map, subseq, etc), and this means that we can share the
| string freely between readers, as we currently expect to be able to.

  I am really curious now.  You _always_ store the state in the object that
  modifies it, _never_ in the object it refers to.  A peculiar C++ disease
  which I had the good fortune of discussing with a project leader who just
  had to vent his frustration with some of his programmers and their sheer
  inability to write threadsafe code precisely because they were hell-bent
  on "optimizing" data storage and stored the state of an iterator in the
  object iterated over.  I wondered how anyone could even think of such an
  obviously boneheaded thing, but these people, he told me, were so deeply
  concerned with not using dynamic memory and conserving memory in general
  that they made this idiotic coding practice a matter of _pride_ and would
  therefore not consider changing it, even when ordered to fix the problem.
  Thread safety or, more generally, the ability to have multiple references
  to the same object, is the Lisp way, and being anal about memory usage is
  not the Lisp way.

| I think that a lot of state is the exception rather than the rule.

  You are actually wrong about this.  The ideal of statelessness is
  generally a very bad idea, as it tries to hide state under the rug.
  Generally, state can be layered, and this is good, but it is therefore
  exctemely important to layer it correctly.  I mean, I thought this would
  be exceptionally obvious when we have a string-stream concept that can
  iterate over a string with stream operators, but you have to be explicit
  about setting up the these iterators.  (It should have been more general,
  so one could iterate over the elements of a vector with read-byte.)

| I also think that as shown above, we can externalise that state into
| points, at an acceptable cost for reasonable encodings.

  I truly wonder how you could have thought that anyone would want to store
  the iteration state in the object iterated over.  That is such a classic
  mistake that I am annoyed that I have to argue against it.

| It may be that I am unaware of some more complex common encodings, if
| there are any that you are thinking of in specific, please let me know.

  Try implementing a full ISO 2022 processor, try representing the device
  that ISO 6429 (informally known as "ANSI escape sequences") writes to, or
  consider the amount of state in a fully fledged MIME processor.  Side-
  effects and modifying state is a good thing, but it must, of course, be
  localized with the functions that maintains the state, not with the
  object that is being referenced incidentally.  Or maybe this is just that
  annoyingly stupid Object Oriented Programming thing, again, where the
  object itself is supposed to know something about how it is used.  This
  is just plain bad design.  Stuffing "next" pointers into a structure to
  build a linked list is equally nuts, but many believe this is good and
  cannot fathom the point of using a vector or a linked list that points to
  the objects in question.  Such people should be kept away from computers.

///
-- 
  In a fight against something, the fight has value, victory has none.
  In a fight for something, the fight is a loss, victory merely relief.