Subject: Re: Core ideas behind SGML and XML
From: rpw3@rpw3.org (Rob Warnock)
Date: Tue, 01 Oct 2002 15:03:55 -0000
Newsgroups: comp.lang.lisp
Message-ID: <upjear6h62a288@corp.supernews.com>
Erik Naggum  <erik@naggum.no> wrote:
+---------------
| * Rob Warnock
| | "attributes" could just as easily be expressed as sub-elements with
| | restrictions on order, much like declarations and documentation strings
| | in CL, yes?
| 
| There is a crucial difference between attributes and subelements today
| that needs to be taken care of in an attribute-free SGML.  Attributes are
| local to their element and therefore can be the same name as an element.
| They could also differ in type from element to element, which translates
| to a different content model or notation as an element.
+---------------

Good point. And also, as Tim Bradshaw pointed out in a parallel
reply <URL:news:ey3u1k6uyv1.fsf@cley.com>, attributes are not
permitted to have sub-structure[1], while sub-elements may.

[1] Well, not *standard*, DTD-defined sub-structure, that is, though
    as Tim noted, one can always hack up an ad-hoc serialization of
    the sub-structure into the string value of an attribute...  (*ugh!*)

+---------------
| | | Using ID and IDREF is exactly analogous to using #n= and #n#.
| | 
| | Aha! That's what I get for not having read the actual SGML specs, I guess.
| 
| But you may need to know #n= and #n# and that they are analogous to see
| this because the application gets to maintain its own table of IDs and
| map the IDREFs back to them, which is kind of unfortunate.
+---------------

Do you mean that the application is allowed to merge (remapping as needed)
IDs/IDREFs from *multiple* documents? ...or *across* multiple docs?
[Otherwise I don't see where any "table" or "mapping" is needed, other
than just the one necessary for #n=/#n# processing during the READ.]

Hmmm... The corresponding thing in CL might be trying to reference the
same #n=/#n# numbers across multiple occurrences of READ. Normally,
that wouldn't be a problem (because it's not possible), but my twisted
little brain just started wondering about how #n= & #n# might interact
with #. in pathological cases, as in:

	(foo #1=bar #.(read-from-string "(a #1=b c #1# d)") gorp #1# blah)

Well, maybe that's not a good example -- both CLISP & CMUCL handle it,
yielding (FOO BAR (A B C B D) GORP BAR BLAH) -- but *this* one, maybe:

	#1=(foo bar #.(read-from-string "#1=(a b c #1# d)") gorp #1# blah)

CLISP handles it this way [assuming (setq *print-circle* t), else it
core-dumps]:

	> '#1=(foo bar #.(read-from-string "#1=(a b c #1# d)") gorp #1# blah)
	#1=(FOO BAR #2=(A B C #2# D) GORP #1# BLAH)
	>

While CMUCL complains:

	Reader error at 3 on #<String-Input Stream>:
	Multiply defined label: #1=

O.k., CLHS "2.4.8.15 Sharpsign Equal-Sign" says:

	The scope of the label is the expression being read by the
	outermost call to read; within this expression, the same label
	may not appear twice.

So perhaps in CLISP's case the READ-FROM-STRING instance causes a
completely different instance of READ (and thus the two #1= aren't
"the same") and in CMUCL's case READ-FROM-STRING "knows" it's inside
an active READ and (legitmately) tosses an error.

But then why does CMUCL *accept* the first form at all?!? They both
have the same "outermost call to read"...

+---------------
| Grasping that this can be used for circular structures is apparently
| hard for SGMLers when they have only learned to think of them in
| "See Figure 1" terms.
+---------------

*ROTFLMAO!*  Yes, I know what you mean, but... Does "See Figure 1"
have the same alternate meaning for you guys over there as it does
around here?!?  ;-}  ;-}

	<URL:http://www.tuxedo.org/~esr/jargon/html/entry/See-figure-1.html>
	<URL:http://www.things.org/~jym/fun/see-figure-1.html>

+---------------
| | I need to think about that one a bit more (and maybe go read some more).
| | At the moment, it's certainly plausible to me that IDREF might need
| | special syntax, but it seems like one might to be able to provide ID
| | (albeit rather awkwardly!) with attribute-free elements [as above].
| 
| One curious effect of moving from attributes to elements is that quite
| often, the best design is to let the attribute become a superelement
| instead of a subelement.
+---------------

Yeah, I was starting to think in that direction, since it reduces the
number of levels of nesting you need in the common case of only one
attribute. It's much like CL WITH-XXX macros, in that way. (And why
am I suddenly thinking of CLOS "around" methods, too? ;-} )


-Rob

-----
Rob Warnock, PP-ASEL-IA		<rpw3@rpw3.org>
627 26th Avenue			<URL:http://www.rpw3.org/>
San Mateo, CA 94403		(650)572-2607