Subject: Re: strings and characters From: Erik Naggum <erik@naggum.no> Date: 2000/03/20 Newsgroups: comp.lang.lisp Message-ID: <3162506888111902@naggum.no> * Gareth McCaughan <Gareth.McCaughan@pobox.com> | What about the complete absence of any statement anywhere in the standard | (so far as I can tell) that it's legal for storing characters in a string | to throw away their attributes? what of it? in case you don't realize the full ramification of the equally completely absence of any mechanism to use, query, or set these implementation-defined attributes to characters, the express intent of the removal of bits and fonts were to remove character attributes from the language. they are no longer there as part of the official standard, and any implementation has to document what it does to them as part of the set of implementation-defined features. OBVIOUSLY, the _standard_ is not the right document to prescribe the consequences of such features! an implementation, consequently, may or may not want to store attributes in strings, and it is free to do or not to do so, and the standard cannot prescribe this behavior. conversely, if implementation-defined attributes were to be retained, shouldn't they have an explicit statement that they were to be retaiend, which would require an implementation to abide by certain rules in the implementation-defined areas? that sounds _much_ more plausible to me than saying "implementation-defined" and then defining it in the standard. when talking about what an implementation is allowed to do on its own accord, omitting specifics means it's free to do whatever it pleases. in any requirement that is covered by conformance clauses, an omission is treated very differently: it means you can't do it. we are not talking about _standard_ attributes of characters (that's the code, and that's the only attribute _required_ to be in _standard_ strings), but about implementation-defined attributes. | I don't see why #1 is relevant. #2 is interesting, but the language is | defined by what the standard says, not by what it used to say. it says "implementation-defined attributes" and it says "subtype of character", which is all I need to go by. you seem to want the standard to prescribe implementation-defined behavior. this is an obvious no-go. it is quite the disingenious twist to attempt to rephrase what I said as "what the standard used to say", but I'm getting used to a lot of weird stuff from your side already, so I'll just point out to you that I'm referring to how it came to be what it is, not what it used to say. if you can't see the difference, I can't help you understand, but if you do see the difference, you will understand that no standard or other document written by and intended for human beings can ever be perfect in the way you seem to expect. expecting standards to be free of errors or of the need of interpretation by humans is just mind-bogglingly stupid, so I'm blithly assuming that you don't hold that view, but instead don't see that you are nonetheless flirting with it. | The point here is simply that there can be several different kinds of | string. The standard says that there may be string types that only | permit a subtype of CHARACTER; it doesn't say that there need be no | string type that permits CHARACTER itself. sigh. the point I'm trying to make is that it doesn't _require_ there to be one particular string type which can hold characters with all the implementation-defined attributes. | (make-array 10 :element-type 'character) [S] | (make-string 10 :element-type 'character) [S'] | | Therefore S and S' are arrays of the same type. sorry, this is a mere tautology that brings nothing to the argument. | Therefore there is at least one string (namely S) that can hold arbitrary | characters. but you are not showing that it can hold arbitrary characters. _nothing_ in what you dig up actually argues that implementation-defined attributes have standardized semantics. an implementation is, by virtue of its very own definition of the semantics, able to define a character in isolation as having some implementation-defined attributes and strings to contain characters without such implementation-defined attributes. this is the result of the removal of the type string-char and the subsequent merging of the semantics of character and string-char. | It doesn't require *every* string type to be able to hold all character | values. It does, however, require *some* string type to be able to hold | all character values. where do you find support for this? nowhere does the standard say that a string must retain implementation-defined attributes of characters. it does say that the code attribute is the only standard attributes, and it is obvious that that attribute must be retained wherever. it is not at all obvious that implementation-defined attributes must survive all kinds of operations. you've been exceedingly specific in finding ways to defend your position, but nowhere do you find actual evidence of a requirement that there exist a string type that would not reject at least some character objects. I'm sorry, but the premise that some string type _must_ be able to hold _all_ characters, including all the implementation-defined attributes that strings never were intended to hold to begin with, is no more than unsupported wishful thinking, but if you hold this premise as axiomatic, you won't see that it is unsupported. if you discard it as an axiom and then try to find support for it, you find that you can't -- the language definition is sufficiently slippery that these implementation-defined attributes don't have any standard-prescribed semantics for them at all, including giving the implementation leeway to define their behavior, which means: not _requiring_ anything particular about them, which means: not _requiring_ strings to retain them, since that would be a particular requirement about an implementation-defined property of the language. | The reason why STRING is a union type is that implementors might want to | have (say) an "efficient" string type that uses only one byte per | character, for storing "easy" strings. Having this as well as a type | that can store arbitrary characters, and having them both be subtypes of | STRING, requires that STRING be a union type. now, this is the interesting part. _which_ string would that be? as far as I understand your argument, you're allowing an implementation to have an implementation-defined standard type to hold simple characters (there is only one _standard_ attribute -- the code), while it is _required_ to support a wider _non-standard_ implementation-defined type? this is another contradiction in terms. either the same requirement is standard or it is implementation-defined -- it can't be both a the same time. I quote from the character proposal that led to the changes we're discussing, _not_ to imply that what isn't in the standard is more of a requirement on the implementation than the standard, but to identify the intent and spirit of the change. as with any legally binding document, if you can't figure it out by reading the actual document, you go hunting for the meaning in the preparatory works. luckily, we have access to the preparatory works with the HyperSpec. it should shed light on the wording in the standard, if necessary. in this case, it is necessary. Remove all discussion of attributes from the language specification. Add the following discussion: ``Earlier versions of Common LISP incorporated FONT and BITS as attributes of character objects. These and other supported attributes are considered implementation-defined attributes and if supported by an implementation effect the action of selected functions.'' what we have is a standard that didn't come out and say "you can't retain bits and fonts from CLtL1 in characters", but _allowed_ an implementation to retain them, in whatever way they wanted. since the standard removed these features, it must be interpreted relative to that (bloody) obvious intent if a wording might be interpreted by some that the change would require providing _additional_ support for the removed features -- such an interpretation _must_ be discarded, even if it is possible to argue for it in an interpretative vacuum, which never exists in any document written by and for human beings regardless of some people's desires. (such a vacuum cannot even exist in mathematics -- which reading a standard is not an exercise in, anyway -- any document must always be read in a context that supplies and retains its intention, otherwise _human_ communication breaks down completely.) #:Erik