Subject: Re: Back to character set implementation thinking From: Erik Naggum <erik@naggum.net> Date: Tue, 26 Mar 2002 06:06:55 GMT Newsgroups: comp.lang.lisp Message-ID: <3226111629531727@naggum.net> * cr88192 <cr88192@hotmail.com> | sorry, I don't really know of byte sizes other than 8... | am I missing something? Yes. A "byte" is only a contiguous sequence of bits in a machine word, and has been used that way by most vendors, for us notably DEC, which contributed the machine instructions we know as LDB and DPB and the notion of a byte specifier, which has bit position in word and length in bits. Failure to support LDB and DPB in hardware is very costly for a large number of useful operations, but on an a byte-addressable world with 8-bit bytes, using anything smaller than bytes that might cross byte boundaries has serious penalties. In a word-addressable world, this saves a lot of memory, even relative to the byte-adressable machines. C has bit fields because it was intended to run on Honewyell 6000, which had 36-bit words, so its "char" was 9 bits wide. (See page 34 of Kernighan & Ritchie, 1st ed.) IBM chose a more specific terminology: 4-bit nybbles (the same spelling deviation as "byte" from "bite"), 8-bit bytes, 16-bit half-words, 32-bit words, and 64-bit double-words. On the PDP-10, we had 36-bit words, 18-bit half-words (and halfword instructions), but bytes were all over the place. I knwo several people who think this is a much better design than the stupid 8-bit design we have today. Sadly, only several, not millions and millions who think Intel's designs are better just because they can buy them. /// -- In a fight against something, the fight has value, victory has none. In a fight for something, the fight is a loss, victory merely relief.