Subject: Re: data structure for markup text From: Erik Naggum <erik@naggum.no> Date: 1999/06/24 Newsgroups: comp.lang.scheme,comp.text.sgml,comp.text.xml,comp.lang.lisp Message-ID: <3139226293475107@naggum.no> * William F. Hammond | Many of the "problems" with SGML arise from the fact that so few good | examples of SGML processors ... are easily accessible. the problems with SGML are conceptual. practical problems have practical solutions and are uninteresting from a language design point of view. some conceptual problems have practical solutions, and while interesting for an implementor, are also uninteresting insofar as they don't have unreasonably high costs. the rest of the conceptual problems must have conceptual solutions, too -- and _those_ are the interesting ones from a language design point of view. my rule of thumb: if "more money" is the answer, you have an uinteresting practical problem -- conversely, the measure of success of conceptually good solution is that solving the problem consumes less resources from then on. | It captures *structural content*. it's unclear what you're trying to tell me that you think I don't know. | SGML is only a framework. indeed it was intended as such, but it is its capacity as framework that I have been lamenting. | There is almost always more than one way to proceed at the design stage. | The absence of a canonical way to proceed can create the appearance of | complexity. Such a human perception is only psychological. while I dismiss practical problems out of hand -- smart people will solve them sooner or later -- respecting human nature is what good design is all about, and the more you respect it, the better the design is. | If one only wants a single presentation, forget SGML. this is the most dangerous statement you can make if you want to destroy SGML completely. it's like saying a programming language is only good for the really complex problems -- it will lose the competition for the people who believe their problems are simple, or who need simple steps from problem to solution. (hint: such people abound.) | The lisp-like structure of the earlier posting is very interesting and | worthwhile. In fact, I prefer that markup style to SGML tagging, which | my eyes do not like to look at. I'm glad to hear it was so intuitively appealing. | (Hence, my GELLMU project that involves still another markup style that | is LaTeX-like.) one of the problems that come with making text the primary syntactic element is that you have to invent so much black magic to keep markup distinct from text. I prefer a much simpler way to deal with this, that used in programming languages: delimit the data, not the code. in particular, Common Lisp's very simple syntax: a string is delimited by double quotes; a backslash precedes a literal character inside a string. no black magic like C, and absolute predictability both reading and writing the strings. (note that one of the uninteresting practical problems of SGML is that SGML's syntax differs according to the SGML declaration, which makes some character sequences magic and others not -- the only way you can _really_ be safe is using character entites for every character.) * Erik Naggum | SGML is just as bad as any other static structure in that latter regard. * William F. Hammond | But also just as good. hello? the whole point of my article was that static structures are insufficient for any publishing problem worth solving. | As the years go by, the test of a markup language created today will be | its amenability to the automatic processing of legacy documents into the | formats of the future. I actually agree. SGML is uniquely slated to flunk this test. if you don't agree, I expect to see your solution to the problem of updating a document automatically when its DTD changes. if this is "impractical", take a look at SQL and the tools created for it: the unique strength of that language is that you can dynamically improve your database without having to dump and reload it, which is what people had to do prior to SQL and its quiet revolution. (yes, that phrase was first used about SQL.) the more complex structures become, the more people need to be able to change them as they learn more about them. SGML is the worst possible language in which to do just that. because SGML/*ML does not support structure rewriting, it cannot survive any serious amount of change. all "macro languages" have such rewriting -- it's what "macro" is all about -- but SGML decided to discard this aspect of being a programming language. without it, people will have to write tons and tons of code to deal with special cases, build front-ends that deal with stuff SGML doesn't, etc, etc. it is no coincident that there are lots of "scripting languages" that produce HTML out there, just as it is no coincidence that tools that process SGML come with their very own languages, way more arcane than anything programming language people could dream up. wouldn't you just _love_ to have a Turing complete markup language, with a nice syntax that both humans and machines could read with ease, which allowed you to do structure rewriting _in_ the language? the only way you can do that is to fully realize that data and code are the _same_. in so doing, you realize that creating short-term convenience barriers between parts of an inextricably linked whole is counterproductive in non-immediate terms: the net effect can only be to force people on both sides of each barrier to reinvent the rest of the whole on their own, which is a phenomenal waste of time, regardless of what they manage to do that is productive and useful, _unless_ your only concern is the short term, in which case such waste has no bearing on the evaluation. #:Erik -- @1999-07-22T00:37:33Z -- pi billion seconds since the turn of the century