Subject: Re: Name for the set of characters legal in identifiers From: Erik Naggum <erik@naggum.no> Date: 14 Jan 2004 05:39:34 +0000 Newsgroups: comp.lang.lisp Message-ID: <3283047574505462KL2065E@naggum.no> * Russell Wallace | A trivial little question, but one that's been bugging me: Is there | a name for that set of characters legal in Lisp identifiers? For | most languages this would be "alphanumeric" (perhaps with a footnote | that _ is regarded as a letter in this context), but Lisp includes | characters like + and - that most languages regard as punctuation. The type STANDARD-CHAR covers the set of characters from which all symbols in the standard packages are made. This simple fact may give rise to the invalid assumption that there must be a particular character set from which all symbols must be made. However, the functions INTERN and MAKE-SYMBOL take a STRING as the name of the symbol to be created, and there is no restriction on this /string/ to be of type BASE-STRING. Likewise, the value of SYMBOL-NAME is only specified to be of type STRING, with no mention of the common observation that it may be a SIMPLE-STRING regardless of whether the corresponding argument to INTERN or MAKE-SYMBOL was. Since the symbols are normally created by the Common Lisp reader, your question is therefore really which characters the reader is able to build into a string that it will pass to INTERN. There is no upper bound on this character set in the standard, but an actual implementation will necessarily place restrictions on this set. In the worst case, the Common Lisp reader does not understand which character is has just read the encoding of, and may produce symbols with garbage bytes that nevertheless reproduce the character in your editor or other character display equipment. Pessimistically, therefore, your question is whether you will find any mention in the standard of any invalid characters in symbols, but you find quite the opposite: After a single-escape character, normally \, any following character will be a constituent character in the symbol name being read, and between the multiple-escape characters, normally |, all characters will be constituent. The best you can hope for is thus that whatever reads the byte stream that is your source file will reject unacceptable encodings. As long as you use an encoded character set that includes the standard characters, there is no restriction on what you can do, and if you use an encoding that does not confuse standard characters and one of your other characters even in the least capable decoders, you will find that there is not even any useful restriction on the /length/ of Common Lisp symbol names. Optimistically, however, the answer to your question is that the set of characters that are legal in identifiers is the standard-class CHARACTER, but you may not be able to produce all of them in any given source file. I am particularly fond of using the non-breaking space in symbol names, just as I use it in filenames under operating systems that believe that ordinary spaces are separators regardless of how much effort one puts into convincing its various programs otherwise. I know people who think there ought to be laws against this practice, but sadly, the Common Lisp standard does not come to their aid. -- Erik Naggum | Oslo, Norway Yes, I survived 2003. Act from reason, and failure makes you rethink and study harder. Act from faith, and failure makes you blame someone and push harder.