Re: strings and characters - Naggum cll archive

Subject: Re: strings and characters
From: Erik Naggum <erik@naggum.no>
Date: 2000/03/20
Newsgroups: comp.lang.lisp
Message-ID: <3162506888111902@naggum.no>

* Gareth McCaughan <Gareth.McCaughan@pobox.com>
| What about the complete absence of any statement anywhere in the standard
| (so far as I can tell) that it's legal for storing characters in a string
| to throw away their attributes?

  what of it?  in case you don't realize the full ramification of the
  equally completely absence of any mechanism to use, query, or set these
  implementation-defined attributes to characters, the express intent of
  the removal of bits and fonts were to remove character attributes from
  the language.  they are no longer there as part of the official standard,
  and any implementation has to document what it does to them as part of
  the set of implementation-defined features.  OBVIOUSLY, the _standard_ is
  not the right document to prescribe the consequences of such features!
  an implementation, consequently, may or may not want to store attributes
  in strings, and it is free to do or not to do so, and the standard cannot
  prescribe this behavior.

  conversely, if implementation-defined attributes were to be retained,
  shouldn't they have an explicit statement that they were to be retaiend,
  which would require an implementation to abide by certain rules in the
  implementation-defined areas?  that sounds _much_ more plausible to me
  than saying "implementation-defined" and then defining it in the standard.

  when talking about what an implementation is allowed to do on its own
  accord, omitting specifics means it's free to do whatever it pleases.  in
  any requirement that is covered by conformance clauses, an omission is
  treated very differently: it means you can't do it.  we are not talking
  about _standard_ attributes of characters (that's the code, and that's
  the only attribute _required_ to be in _standard_ strings), but about
  implementation-defined attributes.

| I don't see why #1 is relevant. #2 is interesting, but the language is
| defined by what the standard says, not by what it used to say.

  it says "implementation-defined attributes" and it says "subtype of
  character", which is all I need to go by.  you seem to want the standard
  to prescribe implementation-defined behavior.  this is an obvious no-go.

  it is quite the disingenious twist to attempt to rephrase what I said as
  "what the standard used to say", but I'm getting used to a lot of weird
  stuff from your side already, so I'll just point out to you that I'm
  referring to how it came to be what it is, not what it used to say.  if
  you can't see the difference, I can't help you understand, but if you do
  see the difference, you will understand that no standard or other
  document written by and intended for human beings can ever be perfect in
  the way you seem to expect.  expecting standards to be free of errors or
  of the need of interpretation by humans is just mind-bogglingly stupid,
  so I'm blithly assuming that you don't hold that view, but instead don't
  see that you are nonetheless flirting with it.

| The point here is simply that there can be several different kinds of
| string.  The standard says that there may be string types that only
| permit a subtype of CHARACTER; it doesn't say that there need be no
| string type that permits CHARACTER itself.

  sigh.  the point I'm trying to make is that it doesn't _require_ there to
  be one particular string type which can hold characters with all the
  implementation-defined attributes.

|    (make-array 10 :element-type 'character)   [S]
|    (make-string 10 :element-type 'character)  [S']
| 
| Therefore S and S' are arrays of the same type.

  sorry, this is a mere tautology that brings nothing to the argument.

| Therefore there is at least one string (namely S) that can hold arbitrary
| characters.

  but you are not showing that it can hold arbitrary characters.  _nothing_
  in what you dig up actually argues that implementation-defined attributes
  have standardized semantics.  an implementation is, by virtue of its very
  own definition of the semantics, able to define a character in isolation
  as having some implementation-defined attributes and strings to contain
  characters without such implementation-defined attributes.  this is the
  result of the removal of the type string-char and the subsequent merging
  of the semantics of character and string-char.

| It doesn't require *every* string type to be able to hold all character
| values.  It does, however, require *some* string type to be able to hold
| all character values.

  where do you find support for this?  nowhere does the standard say that a
  string must retain implementation-defined attributes of characters.  it
  does say that the code attribute is the only standard attributes, and it
  is obvious that that attribute must be retained wherever.  it is not at
  all obvious that implementation-defined attributes must survive all kinds
  of operations.

  you've been exceedingly specific in finding ways to defend your position,
  but nowhere do you find actual evidence of a requirement that there exist
  a string type that would not reject at least some character objects.  I'm
  sorry, but the premise that some string type _must_ be able to hold _all_
  characters, including all the implementation-defined attributes that
  strings never were intended to hold to begin with, is no more than
  unsupported wishful thinking, but if you hold this premise as axiomatic,
  you won't see that it is unsupported.  if you discard it as an axiom and
  then try to find support for it, you find that you can't -- the language
  definition is sufficiently slippery that these implementation-defined
  attributes don't have any standard-prescribed semantics for them at all,
  including giving the implementation leeway to define their behavior,
  which means: not _requiring_ anything particular about them, which means:
  not _requiring_ strings to retain them, since that would be a particular
  requirement about an implementation-defined property of the language.
  
| The reason why STRING is a union type is that implementors might want to
| have (say) an "efficient" string type that uses only one byte per
| character, for storing "easy" strings.  Having this as well as a type
| that can store arbitrary characters, and having them both be subtypes of
| STRING, requires that STRING be a union type.

  now, this is the interesting part.  _which_ string would that be?  as far
  as I understand your argument, you're allowing an implementation to have
  an implementation-defined standard type to hold simple characters (there
  is only one _standard_ attribute -- the code), while it is _required_ to
  support a wider _non-standard_ implementation-defined type?  this is
  another contradiction in terms.  either the same requirement is standard
  or it is implementation-defined -- it can't be both a the same time.

  I quote from the character proposal that led to the changes we're
  discussing, _not_ to imply that what isn't in the standard is more of a
  requirement on the implementation than the standard, but to identify the
  intent and spirit of the change.  as with any legally binding document,
  if you can't figure it out by reading the actual document, you go hunting
  for the meaning in the preparatory works.  luckily, we have access to the
  preparatory works with the HyperSpec.  it should shed light on the
  wording in the standard, if necessary.  in this case, it is necessary.

Remove all discussion of attributes from the language specification.  Add
the following discussion:

   ``Earlier versions of Common LISP incorporated FONT and BITS as
     attributes of character objects.  These and other supported
     attributes are considered implementation-defined attributes and
     if supported by an implementation effect the action of selected
     functions.''

  what we have is a standard that didn't come out and say "you can't retain
  bits and fonts from CLtL1 in characters", but _allowed_ an implementation
  to retain them, in whatever way they wanted.  since the standard removed
  these features, it must be interpreted relative to that (bloody) obvious
  intent if a wording might be interpreted by some that the change would
  require providing _additional_ support for the removed features -- such
  an interpretation _must_ be discarded, even if it is possible to argue
  for it in an interpretative vacuum, which never exists in any document
  written by and for human beings regardless of some people's desires.
  (such a vacuum cannot even exist in mathematics -- which reading a
  standard is not an exercise in, anyway -- any document must always be
  read in a context that supplies and retains its intention, otherwise
  _human_ communication breaks down completely.)
  
#:Erik