Subject: Re: XML and lisp From: Erik Naggum <erik@naggum.net> Date: Fri, 24 Aug 2001 20:03:19 GMT Newsgroups: comp.lang.lisp Message-ID: <3207672197075433@naggum.net> * Kent M Pitman <pitman@world.std.com> > Certainly what you say is undeniably true in terms of practice, and I'd even > give you that the notational distinction is not worth the mechanism, but > is there somewhere that the language actually forces this "role" relationship? No, there is nothing that requires there to be element attributes as a distinct concept from element contents. There are, however, a number of practical things that follow from making that arbitrar distinction which can look like rationales, but if you ask yourself "why can it not be a subelement", there are no real answers, only appeals to the idea that there somehow __have to be a distinction. It took me years to figure out that the whole attribute idea is completely vacuous, and I worked with the creator of SGML himself for several years on several SGML-related standards and projects. I started writing "A conceptual introduction to SGML" back in 1994, but as I had pained my way through five chapters, I had to realize that it was all wrong. There was a basic design mistake in the whole language framework. That mistake is that simply put: "what is good enough for the users of the language is not good enough for its creators". Each and every level of "containership" in SGML has its own syntax, optimized for the task. Each and every level has a different syntax for "the writing on the box" as opposed to "the contents of the box". This follows from a very simple, yet amazingly elusive principle in its design: Meta-data is conceptually incompatible with data. This is in fact wrong. Meta-data is only data viewed from a different angle, and vice versa. SGML forces you to remain loyal to your chosen angle of view. > I wrote a package in Java at a prior employer which automatically > generated XML representations for classes as elements based on Java > metadata, and the tack I took was not that the XML attributes contain > meta-data and the contents data but rather that the XML attributes > contain atomic data and the contents contain compound data, since this is > IN FACT what the real distinction is. The key to understanding this is that there is no _one_ real distinction. There are in fact any number of "real distinctions". You just found one way to wrap your world in the attribute/contents dichotomy because it was there. What would you do if it was not? What would you do if you had only sub-elements? Would you have _invented_ attributes? I do not think anyone would have, because using sub-elements exacts no higher cost than using attributes. > In effect, what I got out of this was a description that allowed two > syntaxes: an easy syntax for easy things, and a hard syntax for hard > things. I propose an easier syntax for the harder things and a slightly harder syntax for the easier things so they do not impose any easy-vs-hard misconceptions on the user and designer. By making both things cost the same, the decision to use an attribute or a sub-element becomes a very different choice. > But what I'm really wondering is whether SGML has some "intended use" > spec that tells you that you have to put meta-info in the "car" of the > "form", and info in the "cdr". I thought the use of these containers was > semantics-free. The intended use has less to do with it than the notion that you can define what is meta-information and what is information at the time you want to decide whether something goes in an attribute or a sub-element. My argument is that this is impossible. Whether it is meta-information or information is a reflection of the actual use, not the intended use. However, given that the mechanism was created, and I will argue that it was not so much created as it was never thought possible to be any other way, it was used to define several language properties. "Now that we have this, would it not also be nice to have that." This means that several of the attribute types grew very far apart from the contents of sub-elements and you sort of "had" to use them as attributes, but only sort of, because the application can and does define the semantics of everything, and if you want ID and IDREF, you can make the same choice as you would in Common Lisp to use symbols or a hash tables of strings. > > I have come to _loathe_ the half-assed hybrid that some XML-in-Lisp tools > > use and produce, because it makes XML just as evil in Lisp as it was in > > XML to begin with, and we have gained absolutely nothing in either power > > of processing or in abstraction, which is so very un-Lisp-like. > > > > <foo bar="zot">quux</foo> > > > > should be read as > > > > (foo (bar "zot") "quux") > > > > Maybe. Macsyma used a similar notation for years (though without the restriction > on container-ness). I don't think the answer is to change to do the rewrite > you suggest. I cannot follow you here. I am not suggesting a rewrite. I suggest that there is _no_ distinction between attribute and sub-element contents. What I am trying to communicate is so emphatically _NOT_ syntax that we will have a severe communications problem if this is not understood. The syntax has a function, and I am challenging the _function_ of the syntax that is believed by many people to support a concept I _also_ challenge. What do you gain from the attribute-vs-contents dichotomy? Why do you need it? What does it do for you? What would you have done if it were not there? What choices and design decisions went into attributes that would go into contents if you did not have attributes? > I don't understand why it's not natural to add the > following as legal syntaxes: > > <foo bar=<zot/>> > > or > > <foo bar=<string>zot</string>>quux</foo> Imagine that all attributes are in fact sub-elements, and this problem just goes away. Please, discard the concept of attributes. They no longer exist. What used to be called "attributes" are only sub-elements with special treatment and a whole bunch of arbitrary restrictions, one of which is lack of internal structure (except insofar as defined by the NOTATION attribute of attributes in SGML). > This would keep people from feeling the attribute list was a shorthand > area and would also allow the storing of complex meta-data. But that is not my goal. My goal is to get rid of the idea that there is a distinction that can be made once and for all, and prematurely at that, that some information is meta-data and some information is data. The core philosophical mistake in SGML is that you can specify these things before you know them. SGML is great for after-the-fact description of structures you already know how to deal with perfectly. It absolutely sucks for structures that are in any way yet to be defined. This is _because_ it is impossible to define what is considered meta-information and what is considered information before you actually have a full-blown software application that is hard to change your mind about. SGML was supposedly designed to free data from the vagaries of software, but when it adopted the attribute-content dichotomy, it dove right into dependency on the software design process instead of the information design process. > Do you know what the reason was that recursive structures were not > allowed in this position in XML? Yes, as a matter of fact, I do. Recursive structures are in fact allowed in attribute values, provided that your application processe them and not the SGML/XML parser. Back in the SGML days, the NOTATION attribute of both elements and attribute values was designed as an "escape" to the application to let some other syntax processor deal with the string of characters. (Please understand that everything SGML/XML is a string of characters. There are no _values_. Imposing valuedom on strings is the kind of semantics that SGML/XML specifically does _not_ support.) > Or perhaps it was the fact that the "real world" substitutes for "parsed > structure" things like that weird assembly code like notation which looks > like > > (A > AHREF=foo.html > -Text > )A > > Perhaps someone was just being uncreative about how a compound-structure > could be offered as an attribute. No, they never actually thought of it that way. You have to understand and appreciate that the design process for SGML was such that some people had a very clear picture of the meta-information-vs-information dichotomy and that it never occurred to anyone that meta-information had exactly the same properties as information. Whoever first decided to define HTML in such a way that unknown elements should be displayed suffered from exactly the same problem. As a sorry consequence, we have elements that have to contain _comments_ that are the real contents because that somebody did not foresee the need to have meta-information in contents. I argue that this is a result of "getting" the invalid meta-information/information dichotomy. If that person had not been bitten by the false idea that meta-information is fundamentally different from information, he would have realized that there would be a need to use element contents for meta-information, as well. > Good. I'd hate for it to be "lost" as merely a post here, though I think > it's fun that you felt comfortable in sharing your thoughts. Well, it took ten years of discomfort with the "attribute" concept before I went back to examine the genesis of the various forms of attributes and persisted in asking the question "could it not have been done with sub-elements", and finally found that the reason it could not was that somebody did not _want_ it to be done with sub-elements, and that the root cause of this was a fundamental misunderstanding of the relationship between information and meta-information. Just like Plato and Aristotle agreed that ideas and concepts were somehow "inherent" in the things we saw and not a property of the person who observed and organized them in his own mind, SGML embodies the false premise that structuring has some inherent qualities and processing that structure should reflect its inherent qualities. The result is that the processing defines the structure. If there is a mismatch between the two, the result is a very painful and elaborate processing, and it can be solved very simply by removing the attribute/sub-contents dichotomy, because once we do that, we return to first principles and can move forward with the same knowledge and experience that created the attributes, but now we can do it with sub-elements, instead, and I can promise you that once you start off on that road, the least of your worries will be recursive structure in attribute values. ///