Subject: the character type From: Erik Naggum <erik@naggum.no> Date: 1996/03/17 Newsgroups: comp.lang.lisp Message-ID: <3036079150517560@arcana.naggum.no> suppose you have a file in an unknown character set. the file starts with some codes that tell you how it is encoded. each character could be 7, 8, 14, 16 or 21 bits wide, encoded as 1, 1, 2, 2, and 4 bytes, respectively. additionally it could be using any of the ISO 2022 or ISO 10646 methods of encoding. suppose you want to read this into the system as _characters_, and work on them as characters, for purposes of mapping, transformation, processing, or interpreting where the encoding is irrelevant to the result, but you need to generate some _encoding_ on output, which need not be a function of the input encodings used. I can fake it with integers, but I really want to distinguish characters from integers. I actually want characters to be known by longish, unique names, and have several possible encodings. I can fake this with symbols, but symbols are not characters, and I think Common Lisp should have a strong enough character type that it could do all this, not the least because of the requirements of internationalization. I would like to be able to have a number of character set descriptions (tables) and encoding algorithms (filter functions) that allow me to return the character read from an input source. let me take a simple example. suppose the character set is ISO 8859-1, but it has been reduced to 7-bit encoding according to ISO 2022, such that SO and SI are used to switch between the "low" and the "high" half. the usual way to deal with this is to let SO and SI toggle the 8th bit, but I don't want that. I want SO and SI to change the mapping for the current 7-bit character set such that the right character is returned directly. I want this because SO and SI are not the only possible character set shift control characters in ISO 2022. additionally, I want to be able to parse escape sequences as their appropriate pseudo-characters, but this is above and beyond the others. ideally, I would like to have a facility that allowed me to create new characters, name them, and assign _multiple_ codes to them. the key to my quest is that it should be possible to read and write them using various encoding schemes and to put them into strings and still to have the rest of the Common Lisp system deal with them. I am disappointed with the character type in Common Lisp -- it seems incongruously inflexible -- and wonder if any work has been done in this area. e.g., are there Common Lisp implementations that allow GB5, UTF, JIS X 0208, etc, or so-called "wide characters"? I think I would like to see `(setf char-name)', `(setf char-code)', and/or a (new) `make-char' function that allowed me to build new characters. the font and bits stuff was a start in the right direction, although less general than they should have been, but these were removed from the ANSI standard. any ideas? #<Erik> -- the Internet made me do it