Subject: Re: 8-bit input (or, "Perl attacks on non-English language   communities!")
From: Erik Naggum <erik@naggum.no>
Date: 1999/02/11
Newsgroups: comp.lang.lisp
Message-ID: <3127698009640562@naggum.no>

* Johannes Beck <beck@informatik.uni-wuerzburg.de>
| It would be a very nice feature to have several CL-Functions localized,
| so you dont have to invent your own routines for do this.

  localization and internationalization has done more to destroy what was
  left of intercultural communication and respect for cultural needs than
  anything else in the entire computer history.  the people who are into
  these things should not be allowed to work with them for the same reason
  people who want power should be the last people to get it.

| I like to mention
| - format (with date, time, floats)

  date and time should be in ISO 8601.  people who want something else can
  write their own printers (and parsers).  nobody agrees to date-related
  representation even within the same office if left to themselves, let
  alone in a whole country or culture.  it's _much_ worse to put something
  in a standard that people will compete with than not to put something in
  a standard.

  I have a printer and reader for date and time that goes like this:

(describe (get-full-time tz:pacific))
#[1999-02-09 13:16:33.692-08:00] is an instance of #<standard-class full-time>:
 The following slots have :instance allocation:
  sec      3127583793
  msec     692
  zone     #<timezone pacific loaded from "US/Pacific" @ #x202b1e02>
  format   nil

[1970-01-01]
=> 2208988800

(setf *parse-time-default* '(1999 02 09))
[21:19]
=> 3127583940

  a friend of mine commented that using [...] (returns a universal-time)
  and #[...] (returns a full-time object) for this was kind of a luxury
  syntax, but the application this was written for reads and writes dates
  and times millions of times a day.

(format nil "~/ISO:8601/" [21:19])
=> "1999-02-09 21:19:00.000"

(format nil "~920:/ISO:8601/" [21:19])
=> "21:19"

(let ((*default-timezone* tz:oslo))
  (describe (read-from-string "#[22:19]")))
=> #[1999-02-09 22:19:00.000+01:00] is an instance of #<standard-class full-time>:
 The following slots have :instance allocation:
  sec      3127583940
  msec     0
  zone     -1
  format   920

  now, floating-point values.  I _completely_ fail to see the charm of the
  comma as a decimal point, and have never used it.  (I remember how
  grossly unfair I thought it was to be reprimanded for refusing to succumb
  to the ambiguity of the comma in third grade.  it was just too stupid to
  use the same symbol both inside and between numbers, so I used a dot.)
  if you want this abomination, it will be output-only, upon special
  request.  none of the pervase default crap that localization in C uses.
  e.g., a version of Emacs failed mysteriously on Digital Unix systems and
  the maintainers just couldn't figure out why, until the person in
  question admitted to having used Digital Unix's fledling "localiation"
  support.  of course, Emacs Lisp now read floating point numbers in the
  "C" locale to avoid this braindamage.  another pet peeve is that "ls" has
  a ridiculously stupid format, but what do people do?  instead of getting
  it right and reversing the New Jersey stupidity, the just translate it
  _halfway_ into other cultures.  sigh.  and since programs have to deal
  with the output of other programs, there are some things you _can't_ just
  translate without affecting everything.  the result is that people can't
  use these "localizations" except under carefully controlled conditions.

| - char-upcase etc. (eg Allegro is wrong when german special chars are
|   involved)

  well, I have written functions to deal with this, too.

(system-character-set)
=> #<naggum-software::character-set ASCII @ #x20251be2>
(string-upcase "soylent grün ist menschen fleisch!")
=> "SOYLENT GRüN IST MENSCHEN FLEISCH!"
;;;;          ^
(setf (system-character-set) ISO:8859-1)
=> #<naggum-software::character-set ISO 8859-1 @ #x20250f52>
(string-upcase "soylent grün ist menschen fleisch!")
=> "SOYLENT GRÜN IST MENSCHEN FLEISCH!"
;;;;          ^

  note, upcase rules that deal with ß->SS and ÿ->IJ are not implemented;
  this is still a simple character-to-character translation, so it leaves
  these two characters alone.

| - Daylight saving time & time zones

  Common Lisp is too weak in this respect, and so are most other solutions.
  it is wrong to let a time zone be just a number when parsing or decoding
  time specifications.  it is wrong to allow only one time zone to be fully
  supported.  I needed to fix this, so time zone data is fetched from the
  timezone database on demand, since the time zone names need to be loaded
  before they can be referenced.

  e.g., tz:berlin is initialized like this:

(define-timezone #"berlin"	    "Europe/Berlin")

  after which tz:berlin is bound to a timezone object:

(describe tz:berlin) ->
#<timezone berlin lazy-loaded from "Europe/Berlin" @ #x202b1fd2> is an instance
    of #<standard-class timezone>:
 The following slots have :instance allocation:
  name       timezone:berlin
  filename   "Europe/Berlin"
  zoneinfo   <unbound>
  reversed   <unbound>

  using it loads the data automatically:

(get-full-time tz:berlin)
=> #[1999-02-09 22:40:24.517+01:00]
tz:berlin
=> #<timezone berlin loaded from "Europe/Berlin" @ #x202b1fd2>

  you can ask for just the timezone of a particular time and zone, and you
  get the timezone and the universal-time of the previous and next changes,
  so it's possible to know how long a day in local time is without serious
  wastes.  (i.e., it is 23 or 25 hours at the change of timezone due the
  infinitely stupid daylight savings time crap, but people won't switch to
  UTC, so have to accomodate them fully.)

(time-zone [1999-07-04 12:00] tz:pacific)
=> 7 t "PDT" 3132208800 3150349200 

| Since every serious OS supports localization LISP-Implementations should
| be forced to use these.

  I protest vociferously.  let's get this incredible mess right.  if there
  is anything that causes more grief than the mind-bogglingly braindamaged
  attempts that, e.g., Microsoft, does at adapting to other cultures, I
  don't know what it is, and the Unix world is just tailing behind them,
  making the same idiotic mistakes.  IBM has done an incredible job in this
  area, but they _still_ listen to the wrong people, and don't realize that
  there are as many ways to write a date in each language as there are in
  the United States, so calling one particular format "Norwegian" is just
  plain wrong.  forcing one format on all Americans in the silly belief
  that they are all alike would would perhaps cause sufficient rioting to
  get somebody's attention, because countries with small populations than
  some U.S. cities just won't be heard.

  e.g., if you want to use the supposedly "standard" Norwegian notation,
  that's 9.2.99, but people will want to write 9/2-99 or 9/2 1999, and if
  you do this, those who actually have to communicate with people elsewhere
  in the world will now be crippled unless they turn _off_ this cultural
  braindamage, and revert to whatever choice they get with the default.

  computers and programmers should speak English.  if you want to talk to
  people in your own culture, first consider international standards that
  get things right (like ISO 8601 for dates and times), then the smartest
  thing you can think of, onwards through to the stupidest thing you can
  think of, then perhaps what people have failed to understand is wrong.
  you don't have to adapt to anyone -- nobody adapts to you, and adapting
  should be a reciprocal thing, so do whatever is right and explain it to
  people.  90% of them will accept it.  the rest can go write their own
  software.  force accountants to see four-digit years, force Americans and
  the British to see 24-hour clocks, use dot as a decimal point, write
  dates and times with all numbers in strictly decreasing unit order, lie
  to managers when they ask if they can have the way they learned stuff in
  grade school in 1950 and say it's impossible in this day and age.
  computers should be instruments of progress.  if that isn't OK with some
  doofus, give him a keypunch, which is what computers looked like at the
  time the other things they ask the computers do to day was normal.  if
  people want you to adapt, put them to the test and see if they think
  adaptation is any good when it happens to themselves.  if it does, great
  -- they do what you say.  if not, you tell them "neither do I", and force
  them to accept your way, anyway.  it's that simple.

#:Erik
-- 
  Y2K conversion simplified: Januark, Februark, March, April, Mak, June,
  Julk, August, September, October, November, December.