Subject: Re: 8-bit input (or, "Perl attacks on non-English language communities!") From: Erik Naggum <erik@naggum.no> Date: 1999/02/11 Newsgroups: comp.lang.lisp Message-ID: <3127698009640562@naggum.no> * Johannes Beck <beck@informatik.uni-wuerzburg.de> | It would be a very nice feature to have several CL-Functions localized, | so you dont have to invent your own routines for do this. localization and internationalization has done more to destroy what was left of intercultural communication and respect for cultural needs than anything else in the entire computer history. the people who are into these things should not be allowed to work with them for the same reason people who want power should be the last people to get it. | I like to mention | - format (with date, time, floats) date and time should be in ISO 8601. people who want something else can write their own printers (and parsers). nobody agrees to date-related representation even within the same office if left to themselves, let alone in a whole country or culture. it's _much_ worse to put something in a standard that people will compete with than not to put something in a standard. I have a printer and reader for date and time that goes like this: (describe (get-full-time tz:pacific)) #[1999-02-09 13:16:33.692-08:00] is an instance of #<standard-class full-time>: The following slots have :instance allocation: sec 3127583793 msec 692 zone #<timezone pacific loaded from "US/Pacific" @ #x202b1e02> format nil [1970-01-01] => 2208988800 (setf *parse-time-default* '(1999 02 09)) [21:19] => 3127583940 a friend of mine commented that using [...] (returns a universal-time) and #[...] (returns a full-time object) for this was kind of a luxury syntax, but the application this was written for reads and writes dates and times millions of times a day. (format nil "~/ISO:8601/" [21:19]) => "1999-02-09 21:19:00.000" (format nil "~920:/ISO:8601/" [21:19]) => "21:19" (let ((*default-timezone* tz:oslo)) (describe (read-from-string "#[22:19]"))) => #[1999-02-09 22:19:00.000+01:00] is an instance of #<standard-class full-time>: The following slots have :instance allocation: sec 3127583940 msec 0 zone -1 format 920 now, floating-point values. I _completely_ fail to see the charm of the comma as a decimal point, and have never used it. (I remember how grossly unfair I thought it was to be reprimanded for refusing to succumb to the ambiguity of the comma in third grade. it was just too stupid to use the same symbol both inside and between numbers, so I used a dot.) if you want this abomination, it will be output-only, upon special request. none of the pervase default crap that localization in C uses. e.g., a version of Emacs failed mysteriously on Digital Unix systems and the maintainers just couldn't figure out why, until the person in question admitted to having used Digital Unix's fledling "localiation" support. of course, Emacs Lisp now read floating point numbers in the "C" locale to avoid this braindamage. another pet peeve is that "ls" has a ridiculously stupid format, but what do people do? instead of getting it right and reversing the New Jersey stupidity, the just translate it _halfway_ into other cultures. sigh. and since programs have to deal with the output of other programs, there are some things you _can't_ just translate without affecting everything. the result is that people can't use these "localizations" except under carefully controlled conditions. | - char-upcase etc. (eg Allegro is wrong when german special chars are | involved) well, I have written functions to deal with this, too. (system-character-set) => #<naggum-software::character-set ASCII @ #x20251be2> (string-upcase "soylent grün ist menschen fleisch!") => "SOYLENT GRüN IST MENSCHEN FLEISCH!" ;;;; ^ (setf (system-character-set) ISO:8859-1) => #<naggum-software::character-set ISO 8859-1 @ #x20250f52> (string-upcase "soylent grün ist menschen fleisch!") => "SOYLENT GRÜN IST MENSCHEN FLEISCH!" ;;;; ^ note, upcase rules that deal with ß->SS and ÿ->IJ are not implemented; this is still a simple character-to-character translation, so it leaves these two characters alone. | - Daylight saving time & time zones Common Lisp is too weak in this respect, and so are most other solutions. it is wrong to let a time zone be just a number when parsing or decoding time specifications. it is wrong to allow only one time zone to be fully supported. I needed to fix this, so time zone data is fetched from the timezone database on demand, since the time zone names need to be loaded before they can be referenced. e.g., tz:berlin is initialized like this: (define-timezone #"berlin" "Europe/Berlin") after which tz:berlin is bound to a timezone object: (describe tz:berlin) -> #<timezone berlin lazy-loaded from "Europe/Berlin" @ #x202b1fd2> is an instance of #<standard-class timezone>: The following slots have :instance allocation: name timezone:berlin filename "Europe/Berlin" zoneinfo <unbound> reversed <unbound> using it loads the data automatically: (get-full-time tz:berlin) => #[1999-02-09 22:40:24.517+01:00] tz:berlin => #<timezone berlin loaded from "Europe/Berlin" @ #x202b1fd2> you can ask for just the timezone of a particular time and zone, and you get the timezone and the universal-time of the previous and next changes, so it's possible to know how long a day in local time is without serious wastes. (i.e., it is 23 or 25 hours at the change of timezone due the infinitely stupid daylight savings time crap, but people won't switch to UTC, so have to accomodate them fully.) (time-zone [1999-07-04 12:00] tz:pacific) => 7 t "PDT" 3132208800 3150349200 | Since every serious OS supports localization LISP-Implementations should | be forced to use these. I protest vociferously. let's get this incredible mess right. if there is anything that causes more grief than the mind-bogglingly braindamaged attempts that, e.g., Microsoft, does at adapting to other cultures, I don't know what it is, and the Unix world is just tailing behind them, making the same idiotic mistakes. IBM has done an incredible job in this area, but they _still_ listen to the wrong people, and don't realize that there are as many ways to write a date in each language as there are in the United States, so calling one particular format "Norwegian" is just plain wrong. forcing one format on all Americans in the silly belief that they are all alike would would perhaps cause sufficient rioting to get somebody's attention, because countries with small populations than some U.S. cities just won't be heard. e.g., if you want to use the supposedly "standard" Norwegian notation, that's 9.2.99, but people will want to write 9/2-99 or 9/2 1999, and if you do this, those who actually have to communicate with people elsewhere in the world will now be crippled unless they turn _off_ this cultural braindamage, and revert to whatever choice they get with the default. computers and programmers should speak English. if you want to talk to people in your own culture, first consider international standards that get things right (like ISO 8601 for dates and times), then the smartest thing you can think of, onwards through to the stupidest thing you can think of, then perhaps what people have failed to understand is wrong. you don't have to adapt to anyone -- nobody adapts to you, and adapting should be a reciprocal thing, so do whatever is right and explain it to people. 90% of them will accept it. the rest can go write their own software. force accountants to see four-digit years, force Americans and the British to see 24-hour clocks, use dot as a decimal point, write dates and times with all numbers in strictly decreasing unit order, lie to managers when they ask if they can have the way they learned stuff in grade school in 1950 and say it's impossible in this day and age. computers should be instruments of progress. if that isn't OK with some doofus, give him a keypunch, which is what computers looked like at the time the other things they ask the computers do to day was normal. if people want you to adapt, put them to the test and see if they think adaptation is any good when it happens to themselves. if it does, great -- they do what you say. if not, you tell them "neither do I", and force them to accept your way, anyway. it's that simple. #:Erik -- Y2K conversion simplified: Januark, Februark, March, April, Mak, June, Julk, August, September, October, November, December.