Subject: Re: XML and lisp From: Erik Naggum <erik@naggum.net> Date: Mon, 27 Aug 2001 05:14:29 GMT Newsgroups: comp.lang.lisp Message-ID: <3207878053204561@naggum.net> * Kent M Pitman <pitman@world.std.com> > You've made allusions to alists as a way of understanding this, but as a > sense of intuition, of course, that doesn't help a Lisp programmer a lot > since plainly an alist is about th leftmost of each named thing, and > people are uneasy about accessing the next-leftmost element behind > it--that usually violates some sense of a-list/stack discipline. Well, this is why association lists work as a metaphor -- attributes in SGML/XML cannot be repeated. If there are more keys in the remainder of the contents, they are not attributes. > You haven't offered an operator whose goal is to be like > destructuring-bind and so to get around this, so the burden seems, to > those looking on, to be on the programmer to pick apart this structure > manually and the set of tools seems light. That's probably only an > artifact of not seeing your tools, rather than anyone's belief that you > have no such tools. Which tools are available for the contents? Why are they _not_ usable directly for the attributes? I fail to grasp what you want to _do_ with the attributes that you cannot do with them if they are sub-elements. You imply that people are unable to deal with sub-elements and need special tools to deal with attributes. This _must_ be wrong. > I think it would help if you posted the NML which helps you manipulate > these, and perhaps a small code fragment that showed an end-to-end use of > constructing an expression in Lisp and having it appear in the XML with > this notation Boris suggests, and the reverse. Then people would be > talking concrete still. I assume that people who voice their concerns in this discussion know SGML. I have no inclination to write tutorials for people who do not. It is a waste of my time, and I know that I will hate it. I have about 500 pages of a book entitled "A Conceptual Introduction to SGML" that I swear to whichever deity is on duty today will _never_ be published, because the design flaws of SGML are so pervasive that the only thing I want to do with them is get rid of them. Accept the fact that I deal with a history of personal pain in this regard. I invested 6 years of my life on SGML and related standards, and the more I worked with it, the more I found that SGML actively destroyed any hope of achieving what it had set out to do, because it is introducing several poisons into the conceptual processes of structuring information. Taking a look at what people do with SGML and XML today has not shown _one_ case of anyone waking up and smelling the coffee, and it has been _burning_ in the coffee machine for a decade. This is my view: You were told that you needed attributes in addition to sub-element contents. Why did you ever _agree_ to that? The onus of proof is normally on he who asserts the positive, and I challenge you to explain to me why you _need_ attributes rather than accepting any challenge to explain why you do _not_ need them when what I say is that _you_ already know perfectly well how to deal with sub-elements. If you have worked with SGML at all, you _know_ that people screw up attributes and sub-elements, and you _have_ to had to deal with one that should have been the other in your processing. It is _impossible_ to get them "right" because the notion that there is a "right" solution depends on information that is not available at the time the distinction is made. Over the years, I have thought of _many_ different ways to deal with the colossal braindamage that is attributes in SGML. One might think of them as (keyword) arguments to functions, but which other information should influence a "function" that deals with an element? Well, first and foremost, its _parentage_. That means that I have already had to get rid of the notion that <foo bar="x" zot="y"> is "really" a function call like (foo :bar "x" :zot "y"). It has to know _so_ much more to do _anything_ right that it is completely useless to cast one's thinking in such terms. SGML must be _questioned_, not accepted as gospel or natural science reporting on some findings. Somebody made a decision to add attributes, and I know for a fact that that was back in the days of typesetting and document production when the idea was that you should be able to "remove" the "tags" and end up with the readable text of the document as it would be printed. That was the _real_ rationale for attributes. I happen to think that was a briliant idea at the time -- competing markup languages have a serious problem in using notations that destroy the ability to figure out easily what it intended for human and what is intended for the machine. (In particular, TeX is a monster.) I tended towards explaining to people that they should not let stuff that should not be displayed be in sub-elements. What a crock of shit that advice is! As soon as GML became more general than producing print documents, for which it was well suited and still is, the attribute concept had become a mill-stone around its neck and it dragged it down fast. It was _wrong_ to keep attributes around when their rationale had been completely eradicated from its set of operating conditions. It made everything incredibly complex. I was one of very few people on this planet to really _study_ the standard, and my brain works in such a way that I still _know_ with immediate certainty whether something is or is not supported by the standard language and how to express it. (It works the exact same way with Common Lisp, Ada (1983, unfortunately :), C (1991), and any number of things I have really sat down to study and understand, and it is so efficient that I even get an emotional response to violations before I see the logic of them.) I love the way my brain works, but it also has serious drawbacks: Overriding and updating old information is something I have to work really hard at. The end result of the way I think and the way the standard is defined is that I immediately saw these massively complex ways to do things that "nobody" understood. Take HyTime and what it calls "architectual forms" -- I vividly remember a long walk around a quiet Tallahassee one summer night with the creator of this concept, when I questioned some of the designs and how it would be implemented, and he was quiet for the longest time before he said that I was probably the first person to have understood what he was _really_ trying to accomplish. That would have been _such_ a great thing if it had been, say, rocket science, but it was not. It was a man-made complexity so great that it had required _months_ of brain- wracking to really get my intuition working. That was the first time I had really serious doubts about the wisdom of SGML's structuring process, because the massive complexity of it all is _completely_ pointless and a result of spreading the semantics so thin that you had to keep mental track of an enormous number of relationships to end up with an idea of what something should do or mean. It does not have to be that way. It was _profoundly_ disappointing to discover that at the end of this long process of grasping something that looked intellectually challenging lie only a complexity that resulted from _rejecting_ simplicity of design at a few crucial points. Hell, it still took me years to figure out what alternatives they _should_ have picked up, and by then it was too late. Now you are probably thinking "how F hard can it be?" and looking half condescending on a retarded monkey who cannot figure out the purposes of the mathematical relationships in calculus. But it is the same problem we find in C++. The question to be asked of massive complexity like that is not "what wonderful things did you find out that made this necessary", but "whatever did you _miss_ that made this so horribly complex"? You can sometimes see people who are really, really dumb go about some simple tasks in a way that tells you that they have arrived at their ways of performing it through an incredibly painful process that they are loathe to reopen or examine at all no matter how hard it is to get it right for them. Some people will construct ways of performing their job so that they utilize all available brainpower, simply because that is indeed a very satisfying feeling. However, when it comes to grasping someone else's _wrong_ ideas, there is no upper bound on complexity. Some people have the most bizarrely convoluted thinking processes and they completely fail to monitor their thinking so they traipse off into oblivion and may or may not come back, but if they do, it is with these spectacularly irrational ideas that they _love_ before they discard them. This is the kind of complexity that befell the SGML community. That I could figure this mess out and think about it and have something dramatic to say about it to the creators, frankly scares me. In any case, I think the core problem is that a request for a rationale for _removing_ a complexifying misfeature is completely bogus. We should not look at what we wound up with, we should look at _how_ we wound up where we are. I have explained how attributes got invented in the first place and it _was_ a good idea at the time. However, as soon as elements got more abstract and elements could contain _no_ information that would wind up on the printed page, but instead other elements that would, and those "abstract" elements would influence the way their sub-elements' contents would wind up on the printed page, it should have been clear that the attribute concept should be scheduled for extinction because some of its roles had now been moved into a different realm where _all_ of its roles could be moved without sacrificing anything. The core idea that went horribly wrong with SGML _because_ of the very sad lack of re-examination of the rationale for attributes is almost so fundamental that removing it will tear down everything that SGML has built with it. This is likely why people resist thinking about it, because it was so painful to learn SGML, it is better to keep out any risk of having to re-experience that pain from another angle. I shall probably have to repeat this core idea forever because so few people really grasp it: SGML claims that some things you want to say about something is meta-information and some things are "normal" information. Like the historical baggage from the characters in the file that wound up in print and the characters that vanished in processing, SGML's view on meta-information and information is that they are inherently different and thus not only distinguishable, but in need of being kept apart, so much so that there are two wildly different languages to describe them. This core mistake leads to an inability to move between views of your own information and conceptualization of its structure, and that is just the way to kill your information. As a result of this dichotomy, SGML imposes an incredibly hard structure on the information. If the information wants to break out of it, the whole structure breaks. (XML is really _nothing_ better, but has all the appeal of tooth decay the way it touts its caries as "extensibility".) There are so many rules in the SGML standard that effectively prohibit a rational way to "flex" its design that people do not refrain from it because they do not consider it useful to be able to, but because any change to a document type definition is associated with an unknowable increase in complexity of processing, especially in the area of bringing legacy documnts in line with the change. The extreme _brittleness_ of the SGML structure is a direct result of the core mistake to strike a dichotomy between meta-information and information, because in real life, the two are in fact _exactly_ the same thing, it is just a matter of who looks at it for which purpose. If you do not believe that, it is because you still think that there _has_ to be a difference. Of course there is, but it is not _inherent_ or _intrinsic_ to the information, it is highly pragmatically determined which is which at any given time. Structuring information is one of the _easiest_ tasks we humans do. All the time, we add meta-information to information and we do not even mark it up as we go. Human languages are chock full of meta-information: "I did not know darkness could be so illuminating", he said, expectingly. We _have_ no desire to mark meta-information as such and directly because it is part and parcel of how we interpret what other people tell us. If I say "yesterday" today, I probably mean "2001-08-26", so I could write <date <formal 2001-08-26> yesterday>, but I could also talk about the past in some general term like <date <formal past> yesterday>, and so on and so forth. What we really grasp about the information we receive _is_ invariably meta-information. The problem is then entirely artificial, since we do this almost automatically. What we really need are means to make the meta-information explicit. I used to believe that this would be a good idea, but until we find ways to "intuit" meta-information from a human context, I believe it is a waste of effort and it could well be counterproductive. What we need is a very limited and very practical approach to obtain a minimal level of meta-information. The more we specify the move we exclude, because as soon as we aim for a certain "depth" of representation, the alternative representations at the same level grow exponentially in number. ///