Subject: Re: Wide character implementation From: Erik Naggum <erik@naggum.net> Date: Sun, 24 Mar 2002 01:46:30 GMT Newsgroups: comp.lang.lisp,comp.lang.scheme Message-ID: <3225923202075012@naggum.net> * Sander Vesik | I also couldn't care less what you think of me. You should realize that only people who care a lot, make this point. | It is pointless to think of glyph in any other way than characters - it | should not make any difference whetever adiaresis is represented by one | code point - the precombined one - or two. In fact, if there is a | detctable difference from anything dealing with text strings the | implementation is demonstratably broken. It took the character set community many years to figure out the crucial conceptual and then practical difference between the "characteristic glyph" of a character and the character itself, namly that a character may have more than one glyph, and a glyph may represent more than one character. If you work with characters as if they were glyphs, you _will_ lose, and you make just the kind of arguments that were made by people who did _not_ grasp this difference in the ISO committees back in 1992 and who directly or indirectly caused Unicode to win over the original ISO 10646 design. Unicode has many concessions to those who think character sets are also glyph sets, such as the presentation forms, but that only means that there are different times you would use different parts of the Unicode code space. Some people who try to use Unicode completely miss this point. It also took some _companies_ a really long time to figure the difference between glyph sets and character sets. (E.g., Apple and Xerox, and, of course, Microsoft has yet to reinvent the distinction badly in the name of "innovation", so their ISO 8859-1-like joke violates important rules for character sets.) I see that you are still in the pre-enlightenment state of mind and have failed to grasp what Unicode does with its three levels. I cannot help you, since you appear to stop thinking in order to protect or defend yourself or whatever (it sure looks like som mideast "honor" codex to me), but if you just pick up the standard and read its excellent introductions or even Unicode: A Primer, by Tony Graham, you will understand a lot more. It does an excellent job of explaining the distinction between glyph and character. I think you need it much more than trying to defend yourself by insulting me with your ignorance. Now, if you want to use or not use combining characters, you make an effort to convert your input to your preferred form before you start processing. This isolates the "problem" to a well-defined interface, and it is no longer a problem in properly designed systems. If you plan to compare a string with combining characters with one without them, you are already so confused that there is no point in trying to tell you how useless this is. This means that thinking in terms of "variable-length characters" is prima facie evidence of a serious lack of insight _and_ an attitude problem that something somebody else has done is wrong and that you know better than everybody else. Neither are problems with Unicode. /// -- In a fight against something, the fight has value, victory has none. In a fight for something, the fight is a loss, victory merely relief.