Subject: Re: why we have cons? From: Erik Naggum <clerik@naggum.no> Date: 1998/01/03 Newsgroups: comp.lang.scheme,comp.lang.lisp Message-ID: <3092837184154309@naggum.no> * xah@best.com (Xah) | From a language design point of view, why do we have the pair notion in | Scheme? well, this list and cons business is an instance of what is commonly known as an abstract data type (ADT) and one of the more successful ones at that. you will learn more about ADTs in the basic CS curricula if you bother to study. successful ADTs have simple interfaces and provably correct implementations that nobody ever need to mess with again. in the case of lists, most languages simply do not have them, and those that do, apart from the Lisp family, get them wrong. there are two reasons for this: (1) short-sighted users who think that "list" is a primitive and can't be bothered to understand why a simple interface doesn't have simple implementations, (2) stupid language designers who believe too strongly in strong typing and thus shoot everybody else in the foot when they want lists of objects and certainly can't be bothered to invent a useful concept for their short-sighted programmers to use. imagine that you want a _list_ ADT that (1) can be read and written as lists would ordinarily be written, (2) can hold any kind of object without modifying the objects in the list, (3) can be taken apart into a head and a tail in constant time, and (4) could be extended with a new object in constant time. as a general rule it takes a real genius to implement something so that it looks simple, in contrast to the dumb people who implement something the simple way, so it becomes complex and generally useless to everybody else. in our case, John McCarthy is probably to credit for the historic invention of the only slightly counter-intuitive implementation of the simple "list" concept (ADT). instead of parsing and printing lists in a special syntax that did not itself produce anything useful, such as C's {a, b, ..., z} syntax, useful only in initializer lists to the compiler, the decision was that lists should be _something_ in their own right, and that programs should be able to deal with them internally as well as read and write them. this is brilliant. instead of having only one particular type in a list of objects that knew they were lists (like linked lists with a pointer in the object itself), it would be real cool to have a list "wrapper" that had at least one pointer to another object and it would be even cooler if it could point to any kind of object so the list could be heterogenous. let's call this the "payload pointer". of course, the payload pointer and the resulting heterogeneity introduces complexity to the reader and writer functions for lists, but fortunately, this complexity is fully contained in the implementation of the abstract data type. constructing a list would be equally painless without the programmer having to know the internals. this is brilliant. instead of the obvious "hardware" solution using a fixed-size vector of pointers and the ability to have a pointer to any element of that vector count as the whole vector, a linked list approach was made in the list "wrapper". this makes it possible to add elements at the head of the list in constant time, instead of having to copy the entire vector into a bigger vector when it outgrew its limits (which you'd have to keep track of somewhere, since you already use the hardware ability to point to any slot in the fixed-sized vector). let's use a "link pointer" and put it in the list "wrapper", too. now, not only do we have a pointer to the object we want lists of, we have a pointer to another list "wrapper" in a linked list, but we said that the payload pointer would be heterogenous and could point to anything, so let's let the payload pointer and the link pointer be the same type. the payload pointer can now point to a list "wrapper" and the link pointer can point to data. this way, we have a general object that has two genral pointers and can be used for more than just lists, although lists have especially nice read and write functions and other accessors. this is brilliant design! finally, let's recognize the fact that there is only one empty list, but instead of having an anchor at the end of every chain, let's let the empty list be represented by a special list "wrapper" that pointed to itself with both the payload and link pointer. again, this adds some complexity to the internals, but we find that this complexity actually reduces overall system complexity a _lot_. (of course, Scheme blew this design issue big time and forces you to treat the empty list as an anchor in _all_ cases, not just where it acts as an anchor.) you see, this list "wrapper" is a profoundly general mechanism. now, for historical reasons, the list "wrapper" object is called a "cons cell", and the payload pointer is known as the car of the cons cell, the link pointer the cdr of the cons cell. this comes from the historic fact that lists were implemented early on in machine registers that were twice as large as pointers, quite contrary to modern machines. the left half of this register was the payload pointer, and the right half was the link pointer. now, this kind of pointer magic had a special meaning to the hardware on which it was implemented. unlike today's braindamaged CPU's, pointers into hardware pointer vectors had a _size_ attached to them, so you could tell when you had reached the end of the list. this was used to implement the push-down list (PDL), better known as a "stack". when you pushed an object on a PDL, the PDL pointer itself would be one closer to the end of the list, so the size would be decremented, and its payload pointer would be incremented. (stacks that grown downward are a newer invention.) in effect, the kind of register that was later reused for the list "wrapper" had an _address_ part and a _decrement_ part. on this ancient hardware, it was useful to extract the left half and the right half into other machine registers. the Contents of the Address Register was obtained with the "car" instruction. likewise, the Contents of the Decrement Register was obtained with the "cdr" instruction. since we were implementing an abstract data type that users would see as lists, it didn't much matter what they were named, but we can probably be glad that it was something that simple (they can be combined like "cddadr", which can be appreciated only when dealing with a complexity that most newbies take years to discover), not something like HLRZ and HRRZ, as which they were actually implemented on another CPU (much more beautiful than the IBM, but direct followups on that topic to alt.sys.pdp10 and/or alt.folklore.computers :). however they were named and initially implemented, it didn't much matter that it could be optimally implemented on one computer or had to be represented as a pair of full machine-word pointers on other computers. the concept of a cons as a pair of pointers prevailed and the "car" and "cdr" instructions became general accessors into the "cons". of course, the Scheme folks threw the baby out with the bathwater in their incessant desire for cleanliness and forgot the history, so now call it "pair?" although they have yet to understand the usefulness of operators like `first', `second', ..., `tenth' for individual elements and `rest' for the tail and stick to `car' and `cdr' always. in Common Lisp, the "list" abstraction is taken one step further, and we have the `list' and `list*' operators to construct lists of elements or elements and another list (tail), respectively. `first' and `rest' refer to the head and the tail of the list abstraction. `cons' is still around because the cons cell is so much more general than just lists. `car' and `cdr' are generally used to access the pointers in the cons cell as such, not as implementation parts of the list abstraction. this is useful at a low level and when dealing with trees or complex list structures. | All in all, the pairs notion is redundant. I hope you understand and appreciate what I have written above so the following does not apply to you anymore. you see, I wrote it all because I _really_ wanted to say that that sentence is the single most ignorant and shallow-minded line that I have ever seen in any programming language newsgroup or forum and I hope _never_ to see anybody display such utter disregard for the brilliance and genius of people who came before them just because they grew up in an age when "simple interface" is sneered and laughed at in favor of "simple implementation" so any dumb fsck can implement it right out of a "for Dummies" book. the "list" concept in Lisp is easy to use but actually quite hard to implement correctly. it is in the language because it is hard to implement and because the dumb fscks who implement it on their own screw up so monumentally. (just look at what C programmers do when they need lists! *shudder*) thank you and have a nice new year. #:Erik -- The year "98" was new 1900 years ago. | Help fight MULE in GNU Emacs 20! Be year 2000 compliant, write "1998"! | http://sourcery.naggum.no/emacs/