Subject: Re: why we have cons?
From: Erik Naggum <clerik@naggum.no>
Date: 1998/01/03
Newsgroups: comp.lang.scheme,comp.lang.lisp
Message-ID: <3092837184154309@naggum.no>


* xah@best.com (Xah)
| From a language design point of view, why do we have the pair notion in
| Scheme?

  well, this list and cons business is an instance of what is commonly
  known as an abstract data type (ADT) and one of the more successful ones
  at that.  you will learn more about ADTs in the basic CS curricula if you
  bother to study.  successful ADTs have simple interfaces and provably
  correct implementations that nobody ever need to mess with again.  in the
  case of lists, most languages simply do not have them, and those that do,
  apart from the Lisp family, get them wrong.  there are two reasons for
  this: (1) short-sighted users who think that "list" is a primitive and
  can't be bothered to understand why a simple interface doesn't have
  simple implementations, (2) stupid language designers who believe too
  strongly in strong typing and thus shoot everybody else in the foot when
  they want lists of objects and certainly can't be bothered to invent a
  useful concept for their short-sighted programmers to use.

  imagine that you want a _list_ ADT that (1) can be read and written as
  lists would ordinarily be written, (2) can hold any kind of object
  without modifying the objects in the list, (3) can be taken apart into a
  head and a tail in constant time, and (4) could be extended with a new
  object in constant time.  as a general rule it takes a real genius to
  implement something so that it looks simple, in contrast to the dumb
  people who implement something the simple way, so it becomes complex and
  generally useless to everybody else.  in our case, John McCarthy is
  probably to credit for the historic invention of the only slightly
  counter-intuitive implementation of the simple "list" concept (ADT).

  instead of parsing and printing lists in a special syntax that did not
  itself produce anything useful, such as C's {a, b, ..., z} syntax, useful
  only in initializer lists to the compiler, the decision was that lists
  should be _something_ in their own right, and that programs should be
  able to deal with them internally as well as read and write them.  this
  is brilliant.

  instead of having only one particular type in a list of objects that knew
  they were lists (like linked lists with a pointer in the object itself),
  it would be real cool to have a list "wrapper" that had at least one
  pointer to another object and it would be even cooler if it could point
  to any kind of object so the list could be heterogenous.  let's call this
  the "payload pointer".  of course, the payload pointer and the resulting
  heterogeneity introduces complexity to the reader and writer functions
  for lists, but fortunately, this complexity is fully contained in the
  implementation of the abstract data type.  constructing a list would be
  equally painless without the programmer having to know the internals.
  this is brilliant.

  instead of the obvious "hardware" solution using a fixed-size vector of
  pointers and the ability to have a pointer to any element of that vector
  count as the whole vector, a linked list approach was made in the list
  "wrapper".  this makes it possible to add elements at the head of the
  list in constant time, instead of having to copy the entire vector into a
  bigger vector when it outgrew its limits (which you'd have to keep track
  of somewhere, since you already use the hardware ability to point to any
  slot in the fixed-sized vector).  let's use a "link pointer" and put it
  in the list "wrapper", too.  now, not only do we have a pointer to the
  object we want lists of, we have a pointer to another list "wrapper" in a
  linked list, but we said that the payload pointer would be heterogenous
  and could point to anything, so let's let the payload pointer and the
  link pointer be the same type.  the payload pointer can now point to a
  list "wrapper" and the link pointer can point to data.  this way, we have
  a general object that has two genral pointers and can be used for more
  than just lists, although lists have especially nice read and write
  functions and other accessors.  this is brilliant design!

  finally, let's recognize the fact that there is only one empty list, but
  instead of having an anchor at the end of every chain, let's let the
  empty list be represented by a special list "wrapper" that pointed to
  itself with both the payload and link pointer.  again, this adds some
  complexity to the internals, but we find that this complexity actually
  reduces overall system complexity a _lot_.  (of course, Scheme blew this
  design issue big time and forces you to treat the empty list as an anchor
  in _all_ cases, not just where it acts as an anchor.)

  you see, this list "wrapper" is a profoundly general mechanism.

  now, for historical reasons, the list "wrapper" object is called a "cons
  cell", and the payload pointer is known as the car of the cons cell, the
  link pointer the cdr of the cons cell.  this comes from the historic fact
  that lists were implemented early on in machine registers that were twice
  as large as pointers, quite contrary to modern machines.  the left half
  of this register was the payload pointer, and the right half was the link
  pointer.  now, this kind of pointer magic had a special meaning to the
  hardware on which it was implemented.  unlike today's braindamaged CPU's,
  pointers into hardware pointer vectors had a _size_ attached to them, so
  you could tell when you had reached the end of the list.  this was used
  to implement the push-down list (PDL), better known as a "stack".  when
  you pushed an object on a PDL, the PDL pointer itself would be one closer
  to the end of the list, so the size would be decremented, and its payload
  pointer would be incremented.  (stacks that grown downward are a newer
  invention.)  in effect, the kind of register that was later reused for
  the list "wrapper" had an _address_ part and a _decrement_ part.

  on this ancient hardware, it was useful to extract the left half and the
  right half into other machine registers.  the Contents of the Address
  Register was obtained with the "car" instruction.  likewise, the Contents
  of the Decrement Register was obtained with the "cdr" instruction.  since
  we were implementing an abstract data type that users would see as lists,
  it didn't much matter what they were named, but we can probably be glad
  that it was something that simple (they can be combined like "cddadr",
  which can be appreciated only when dealing with a complexity that most
  newbies take years to discover), not something like HLRZ and HRRZ, as
  which they were actually implemented on another CPU (much more beautiful
  than the IBM, but direct followups on that topic to alt.sys.pdp10 and/or
  alt.folklore.computers :).

  however they were named and initially implemented, it didn't much matter
  that it could be optimally implemented on one computer or had to be
  represented as a pair of full machine-word pointers on other computers.
  the concept of a cons as a pair of pointers prevailed and the "car" and
  "cdr" instructions became general accessors into the "cons".  of course,
  the Scheme folks threw the baby out with the bathwater in their incessant
  desire for cleanliness and forgot the history, so now call it "pair?"
  although they have yet to understand the usefulness of operators like
  `first', `second', ..., `tenth' for individual elements and `rest' for
  the tail and stick to `car' and `cdr' always.

  in Common Lisp, the "list" abstraction is taken one step further, and we
  have the `list' and `list*' operators to construct lists of elements or
  elements and another list (tail), respectively.  `first' and `rest' refer
  to the head and the tail of the list abstraction.  `cons' is still around
  because the cons cell is so much more general than just lists.  `car' and
  `cdr' are generally used to access the pointers in the cons cell as such,
  not as implementation parts of the list abstraction.  this is useful at a
  low level and when dealing with trees or complex list structures.

| All in all, the pairs notion is redundant.

  I hope you understand and appreciate what I have written above so the
  following does not apply to you anymore.  you see, I wrote it all because
  I _really_ wanted to say that that sentence is the single most ignorant
  and shallow-minded line that I have ever seen in any programming language
  newsgroup or forum and I hope _never_ to see anybody display such utter
  disregard for the brilliance and genius of people who came before them
  just because they grew up in an age when "simple interface" is sneered
  and laughed at in favor of "simple implementation" so any dumb fsck can
  implement it right out of a "for Dummies" book.  the "list" concept in
  Lisp is easy to use but actually quite hard to implement correctly.  it
  is in the language because it is hard to implement and because the dumb
  fscks who implement it on their own screw up so monumentally.  (just look
  at what C programmers do when they need lists!  *shudder*)

  thank you and have a nice new year.

#:Erik
-- 
The year "98" was new 1900 years ago.  |  Help fight MULE in GNU Emacs 20!
Be year 2000 compliant, write "1998"!  |  http://sourcery.naggum.no/emacs/