Kent M Pitman <pitman@nhplace.com> wrote:
+---------------
| Geoff Wozniak <geoff.wozniak@gmail.com> writes:
| > There's no interface function for readtables that returns
| > the syntax type of characters (that I know of -- did I miss it?)
|
| No interface provided that I know of either. But I think you can
| create one that's reasonably portable from whole cloth if you work
| at it ... it's just that the implementation technique is not even
| remotely straightforward. ...
|
| > ...nor is there a
| > reliable way to compare syntax types to the standard readtable.
|
| Once you have a way of getting the syntax type, that's easy.
+---------------
Having dug into the guts of CMUCL once upon a time [when trying
(successfully, I think) to build a CL reader in C] I know that
CMUCL doesn't store "the syntax type" as a single "thing", and
that SET-SYNTAX-FROM-CHAR *doesn't* always copy all of the "minor"
attributes of a character:
(defun set-syntax-from-char (to-char from-char &optional
(to-readtable *readtable*)
(from-readtable ()))
"Causes the syntax of to-char to be the same as from-char in the
optional readtable (defaults to the current readtable). The
from-table defaults the standard lisp readtable by being nil."
(let ((from-readtable (or from-readtable std-lisp-readtable)))
;;copy from-char entries to to-char entries, but make sure that if
;;from char is a constituent you don't copy non-movable secondary
;;attributes (constituent types), and that said attributes magically
;;appear if you transform a non-constituent to a constituent.
(let ((att (get-cat-entry from-char from-readtable)))
(if (constituentp from-char from-readtable)
(setq att (get-secondary-attribute to-char)))
(set-cat-entry to-char att to-readtable)
(set-cmt-entry to-char
(get-cmt-entry from-char from-readtable)
to-readtable)))
t)
Some explanation: In CMUCL, readtables contain a CHARACTER-MACRO-TABLE
and a separate CHARACTER-ATTRIBUTE-TABLE, which describes a few
"primary" character attributes, WHITESPACE, TERMINATING-MACRO,
ESCAPE, and CONSTITUENT, as well as a whole bunch of secondary
attributes that are related to the "primary" by having a numerical
ordering relationship. E.g.:
> (list lisp::whitespace
lisp::terminating-macro
lisp::escape
lisp::constituent
lisp::constituent-dot
lisp::constituent-expt
lisp::constituent-slash
lisp::constituent-digit
lisp::constituent-sign
lisp::multiple-escape
lisp::package-delimiter
lisp::delimiter
lisp::constituent-decimal-digit
lisp::constituent-digit-or-expt
lisp::constituent-invalid)
(0 1 2 3 4 5 6 7 8 10 11 12 13 14 15)
> (lisp::character-attribute-table *readtable*)
#(3 3 3 3 3 3 3 3 15 0 0 3 0 0 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
0 3 1 3 3 3 3 1 1 1 3 8 1 8 4 6 7 7 7 7 7 7 7 7 7 7 11 1 3 3 3 3
3 3 3 3 5 5 5 3 3 3 3 3 5 3 3 3 3 3 3 5 3 3 3 3 3 3 3 3 2 3 3 3 1
3 3 3 5 5 5 3 3 3 3 3 5 3 3 3 3 3 3 5 3 3 3 3 3 3 3 3 10 3 3 15 3
3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3)
>
There's also a static [initialized at startup] SECONDARY-ATTRIBUTE-TABLE
that contains such "secondary" attributes -- *which cannot be changed* --
but which (as you can see in the code above) are copied to the target
of a SET-SYNTAX-FROM-CHAR *instead* of the current attributes if the
"from" character is a constituent:
> lisp::secondary-attribute-table
#(3 3 3 3 3 3 3 3 15 15 15 3 15 15 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
3 15 3 3 3 3 3 3 3 3 3 3 8 3 8 4 6 7 7 7 7 7 7 7 7 7 7 11 3 3 3 3 3
3 3 3 3 5 5 5 3 3 3 3 3 5 3 3 3 3 3 3 5 3 3 3 3 3 3 3 3 3 3 3 3 3 3
3 3 5 5 5 3 3 3 3 3 5 3 3 3 3 3 3 5 3 3 3 3 3 3 3 3 10 3 3 15 3 3 3
3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3)
>
These two differ in the following places:
> (loop for cat across (lisp::character-attribute-table *readtable*)
and sat across lisp::secondary-attribute-table
and i from 0
when (/= cat sat)
collect (list i (code-char i) cat sat))
((9 #\Tab 0 15) (10 #\Newline 0 15) (12 #\Page 0 15) (13 #\Return 0 15)
(32 #\Space 0 15) (34 #\" 1 3) (39 #\' 1 3) (40 #\( 1 3) (41 #\) 1 3)
(44 #\, 1 3) (59 #\; 1 3) (92 #\\ 2 3) (96 #\` 1 3))
>
E.g., many characters that are normally whitespace are considered
"constituent-invalid" if the character has been changed into a
constituent, and likewise many characters that are normally
terminating macro characters are considered constituents if
the character has been changed into a constituent. [At least,
I *think* that's what it means... I'm not entirely sure!!]
Anyway, this is just meant to show that extracting "the" syntax of
a character from a given implementation is sometimes... complicated.
-Rob
p.s. There's also a DISPATCH-TABLES element, "which is an alist from
dispatch characters to vectors of CHAR-CODE-LIMIT functions, for use
in defining dispatching macros", but I'm going to ignore that here.
-----
Rob Warnock <rpw3@rpw3.org>
627 26th Avenue <URL:http://rpw3.org/>
San Mateo, CA 94403 (650)572-2607