Subject: Re: data structure for markup text From: Erik Naggum <erik@naggum.no> Date: 1999/06/21 Newsgroups: comp.lang.scheme,comp.text.sgml,comp.text.xml,comp.lang.lisp Message-ID: <3138960936337701@naggum.no> * pomije@my-deja.com | You are probably uniquely qualified to explain the inherent problems with | SGML and XML. well, thank you. | It would be a real public service if you were to write them up. it might be, but I have already donated several thousand hours of work to the SGML community and found it very hard to get anything in return on the investment despite the fact that other people benefited commercially and significantly so from my work, so I'm through with "public services". | I know that I would like to read this and to be able to point others to | this information to avoid costly future mistakes. I can tell you this: I started to write a book on SGML (working title: "A Conceptual Introduction to SGML") and received very serious interest from Prentice-Hall, but my work on that book brought to the surface all the problems I had found with SGML, such as the sorry fact that as a syntax, it is incredibly complex for its simple task and its semantic powers are way too simplistic to be worth the straight-jacket. not to mention the then (1994) obvious future impact of HTML on publishing, but HTML is so amazingly stupidly designed that whatever was left of SGML's goals and of the representation of information for longevity completely evaporated after HTML 2.0 became obsolete, both in theory and in practice. I spent a year or so agonizing over the fact that it would be really expensive for me to cut the losses and move elsewhere, but that's what I had to do, because I would only get even more unhappy if I didn't get out of it: SGML grew in the back-end of the production line, but it really has no value except insofar as it is able to capture meta-information, and that means capturing the _intent_ of some particular forms of expression, but what intent can you possibly discern when you "convert" a poor users's struggle to get something that just looks OK out of Microsoft Word into some DTD that was designed to be easily mapped to Word in the first place? such uses of SGML are hugely expensive wastes of time. if SGML (or a similarly capable system) isn't with you from the start, you should be happy to get something that looks OK in print. and if you are clever enough to use SGML on the front-end, chances are you will develop a system that far surpasses SGML, anyway, and you're back to using SGML as formatting back-end if that's the kind of tools you have available. in this process, all that SGML can offer is a syntax for delimiting tags on elements of contents, and some important restrictions on structuring that means you have be needlessly clever in the machine-generated output, such as using HyTime architectural forms, but by so doing, you severely limit the application independence of the data, and you discover that it's way easier to generate different SGML depending on need rather than use the same SGML for different uses. what good SGML does is in the document management arena, and sometimes, a straight-jacket is just what you need to keep insane people in check, such as those who base an entire company's document base on Microsoft's products and secret and uintelligible document "formats". this aspect of SGML _is_ very important, but once the structuring process is in the works, document types are defined, etc, and the applications are written, where did _SGML_ go, and how much did using SGML cost on top of the very necessary cleanup process? if you can't do this process without SGML, by all means, go for it, but if you can, SGML represents additional cost and no benefits that cannot be obtained cheaper and better by other means. a succinct summary of the lessons I have learned is a pun on the old advice: "know SGML, forget SGML". or in other words, transcend syntax and look at the concepts that SGML affords expression of, then push forward and desire the concepts that SGML does not afford expression of and see that you're better of without SGML, but needed to understand it in order to move further. if you stick with SGML, you hit your head in the glass ceiling and spend all your effort being clever within very restricted bounds -- that's the waste of time you should avoid. however, that said, if you find yourself comfortable within the bounds of what SGML makes relatively easy and desire no more, I'm not going to ask you to choose a better approach to a problem you don't have. if you want to use the quality tools that eat SGML, design simple DTDs that capture structure in a natural way relative to the end result, don't try to be clever in the DTD design phase. if it's hard to do in SGML, use some other language or tool, such as a real programming language and database support. the core problem is that neither SGML, nor HTML, nor XML actually _scale_ or help you _evolve_, and human endeavors are wont to grow and evolve. longevity and stability is not to be found in solid structure, but in the ability to turn around effortlessly yet _without_ losing the past. SGML is just as bad as any other static structure in that latter regard. anyway, this discussion started (as far as I got into it, anyway) with a desire to represent SGML-like structure in a dynamic programming language such as Common Lisp, and that's the next step: when your data can easily be interpreted as function calls, you can write programs that produce the desired output as you "execute" the document. the syntax you use for this is not particularly important and it might as well be SGML if you can trivially read it into the program and process it, but SGML is such a pain to read and process that you should think twice about using it. put another way: whether you write <foo bar=1>zot</foo> or ((foo :bar 1) "zot") is not important; that you think in terms of function calls, programming languages, and dynamic semantics is. but why then accept all the bother with a truly arcane syntax? _that's_ the question I couldn't answer. watching other people struggle like mad with the syntax and not even _getting_ the semantic relationships and purposes was what tipped me off: SGML had a constructive goal, but is counter-productive in practice. the productive approach is to understand the constructive goal and do it some much less involved way. now, if you want to do that, I'm all ears. #:Erik -- @1999-07-22T00:37:33Z -- pi billion seconds since the turn of the century