Subject: Re: case-sensitivity and identifiers (was Re: Wide character implementation) From: Erik Naggum <erik@naggum.net> Date: Mon, 25 Mar 2002 05:06:41 GMT Newsgroups: comp.lang.lisp Message-ID: <3226021614417921@naggum.net> * Ed L Cashin <ecashin@uga.edu> | Could you elaborate on that a bit? I'm interested because it appears | that you're position is that case-sensitivity in identifiers is a Bad | Thing for programming languages. I consider it a bad thing to believe that A is a different character from a just because it has a certain "presentation property". I mean, we do not distinguish characters based on font or face, underlining or color, and most people realize that these are incidental properties. However, capitalness of a letter is just as incidental: The fact that a letter is capitalized depending on such randomness as the position of the word in the sentence is a very strong indicator that "However" and "however" are not different words, which is effectively what case-sensitive people think they are. I tried to publish text without this incidental property for a while, but it seemed to tick people off even more than calling an idiot an idiot. | A general principle of mine is that if things are distinguishable, they | should not be collapsed but the distinction should be preserved whenever | possible. Treating different characters as the same character, or | treating different character sequences as equivalent, should be postponed | as long as possible in order to preserve information. If you use colors to distinguish keywords from identifiers in our editor, can you use a keyword with a different color as an identifier? | Are you suggesting that this principle is inappropriate to apply to the | character sequences that compose identifiers in source code? That would | mean that "ABLE" is the same identifier as "able". | I must admit that when I first found out that current lisps have | case-insensitive symbol names, I thought it reminiscent of BASIC -- kind | of a throwback to a time when memory was much more at a premium. But this is not the case. The symbol names are case-sensitive, but the Common Lisp reader maps all unescaped characters to uppercase by default. You can change this. Symbols are in this fashion just like normal words in your natural language. | (I know that Lisp predates BASIC. I'm talking about my reaction.) I'd | be happy to hear a good case for case-insensitive identifiers. I think case sensitivity is an abuse of an incidental property. Thus, I want to hear a good case for case-sensitive identifers. Older languages did not have this property, but after Unix (which has a case-insensitive tty mode!), the norm became to distinguish case, largely because there were no other namespace functionality in early C. Unix also chose to use lower-case commands whereas Multics had always supported case-folding. I believe the reason that the Unix people wanted to distinguish case was that it would require an extra instruction and a lookup table that would waste a precious 128 bytes of memory in the kernel, while we currently waste an enormous amount of memory to keep case-folding tables several times over. In my view, case-sensitive identifiers has become the norm in a community that has failed to think about proper solutions to their problems, but rather choose to solve only the immediate problem, much like C strongly encourages irrelevant micro-optimization. So instead of being nice to the user, they were nice to the programmer, who did not have to case-fold the incomding identifiers. I consider moving this burdon onto the user to be quite user-inimical and actually quite foreign to people who do not know the character coding standards. I mean, do we have case-sensitive trademarks, even though we traditionally capitalize proper names? Are Oracle and ORACLE different companies any more than ORACLE in red boldface 14 point Times Roman is a different company than ORACLE in blue italic 12 point Helvetica? There has definitely been "paradigm shift" in computer people's view on case, but not in non-computer people. Internet protocols like SMTP use case-insensitive commands. The DNS is case-insensitive. SGML is case-insensitive and so is HTML. Because of the huge problems we face with case-folding Unicode (which must be done with a table of some kind), some people have figured that we should _not_ do case-folding. That is the wrong solution to the problem. The right solution to the problem is to get rid of case as a character property. Now, assume that we no longer have different character codes for lower- case and upper-case letters. Would there be any difference in how we look at text on computer screens, in print, etc? No, of course not. Therefore, people would still be able to distinguish identifiers visually based on case if they want to -- just like the Common Lisp reader allows you to write |car| to refer to the symbol named "car", and |CAR| to refer to the symbol named "CAR", and just like Unix can deal with upper- and lower-case letters even when iuclc and olcuc is in effect with the xcase option by backslashing the real uppercase characters in your input. (In Common Lisp, you would backslash a lower-case character in the default reader mode, and the printer will escape those characters that should not be case-folded.) However, being able to do something and actually doing it are two very different things. E.g., on TOPS-20, you could use lower-case letters in filenames if you really wanted to, by prefixing them with ^V. Very few people bothered to do this because typing it in was a hassle. I do not propose any change to how we input upper and lower case, but with the anal-retentive approach to saving bits, which has even gone so far as to write FooBarZot instead of foo-bar-zot, the probablity that they C freaks would have chosen case-sensitivity would be remarkably lower -- if we could go back and design the world over... /// -- In a fight against something, the fight has value, victory has none. In a fight for something, the fight is a loss, victory merely relief.