Subject: Re: symbol name with case preserved From: Erik Naggum <erik@naggum.no> Date: 18 Jan 2004 01:22:22 +0000 Newsgroups: comp.lang.lisp Message-ID: <3283377742132730KL2065E@naggum.no> * William J. Lamar | Do any of the free-as-in-freedom Common Lisp implementations (CMUCL, | SBCL, CLISP, GCL, etc.) support a way to get the name of a symbol with | it's case preserved? The route from source to symbol name involves case conversion, but the route from symbol name to string does not. This is important if you want to understand this admittedly complex issue, but it is complex mostly because people do not pay attention to detail. | Although Common Lisp is a case-insensitive language, This is not true. | it nevertheless would be possible for an implementation to have an | extension function that returns a string representation of a symbol | the way the symbol appeared in the source code. No, this is not possible. Common Lisp does not retain the source from which it builds its internal representation. This is actually crucial to understand. Common Lisp is defined on the internal form, not on the character sequence that is the source. | To help illustrate what I am describing, here is part of a session | with CMUCL: | | * (symbol-name 'Hello) | "HELLO" | * | | What I am looking for is a function foo where | | (foo 'Hello) | | would result in the string "Hello". Now that you understand that the reader transforms a character source into an internal representation and that you can never recover that character source, you know that you need to ask for ways to retain the case of the symbols that you read. The first, which is the simplest and which requires minimal effort to understand and use properly, is the multiple escape characters in the reader. |Hello| is a symbol whose symbol-name is the exact character sequence between the ||. When you use this notation, you communicate intent and both human and machine readers of the source code will know that case matters in this particular situation. The second, as has been mentioned by others, is to use READTABLE-CASE to modify the case conversion behavior of the Common Lisp reader. | Currently, this means that all C++ identifier names in my Common | Lisp code are quoted as strings. That should make it easy use || instead of "", and you're on your way. If you decide to investigate READTABLE-CASE, you wil run into a very annoying problem: the case of Common Lisp symbols is upper-case, which these days is useful mostly in news articles like this to make them stand out from other words, so if you ask for :PRESERVE, you get to shout a lot. It would have been easy to break with the past and declare symbol names to be lower-case back when Common Lisp was defined, but They chose not to, and breaking with the standard today is not particularly smart. To accomodate those who wanted to write their code in lower-case and still use case sensitive symbol names, the :INVERT case mode was defined, and this works sufficiently well. It may also be supported natively in an implementation that may choose to represent symbol-names in lower-case internally to cut down on the case conversion costs, which get more noticeable with larger character sets. The only central issue is what case COMMON-LISP:SYMBOL-NAME returns and COMMON-LISP:INTERN (etc) takes, and the key to this issue is to realize it is completely unrelated to the internal representation. An implementation is free to offer its own |symbol-name| and |intern| (etc) functions in a package that it might call |common-lisp|, which reflects the internal representation, as long as it does the right thing for COMMON-LISP:SYMBOL-NAME (etc). It might even decide to offer its own |readtable-case| function that swaps the meaning of :PRESERVE and :INVERT, and thus allow the external and internal case to work smoothly and effortlessly together. It might even decide to cache the result of applying a function INVERT-CASE to a symbol name if it is requested or created via the standard functions so that the performance penalty would be negligible. The problem is that :INVERT makes a symbol in which all characters with case have the same case, invert them all to the other case, while it leaves those that contain one character with case in each case alone. I don't know the history of this decision, but I know it was painful to several parties present and I have no desire to re-open this wound, but let's look at what an implementation that wants lower-case symbol names, a case sensitive reader, and conformance to the standard would most intelligently do. It would /not/ do the inversion trick except when user code asks for the standard symbol name or creates a symbol through the standard functions, which is actually a very rare thing. The important issue to take away from this discussion is that Common Lisp standard does not mandate that symbols are stored internally in any particular case; the standard only mandates what various functions accept and return. I think all modern Common Lisp implementations should optimize for the lower-case, literal symbol and should treat the upper-case symbols as a relic of the past that is supported via the standard functions but bypassed when reading and writing source code and results. The switch is easily accomplished by doing the :INVERT trick in the accessors to symbol names first, and then gradually changing the calls to them all through the source code. It will take a little while before the new system outperforms the old system, but the end result will be vastly less case conversion. Users can prepare and encourage the whole thing by starting to do (setf (readtable-case ...) :invert) and the vendors can prepare them for the future with a package |common-lisp| that has variables and functions that invert the meaning of the case. -- Erik Naggum | Oslo, Norway Act from reason, and failure makes you rethink and study harder. Act from faith, and failure makes you blame someone and push harder.