Subject: Re: Best way of reading data IN the same file of a scheme program
From: rpw3@rigden.engr.sgi.com (Rob Warnock)
Date: 18 Feb 2002 05:43:36 GMT
Newsgroups: comp.lang.scheme
Message-ID: <a4q4a8$ckdmg$1@fido.engr.sgi.com>
A few days ago, Thomas Baruchel <name@provider> wrote:
+---------------
| Hi, I have the idea of writing a very simple text formatter that would
| parse AFM files and then format a source. What is original is that the
| source would be written in Scheme. I mean that formatting instructions
| would be a Scheme code using high-level macros. But I would like to
| integrate in the source the text and the instructions (like in TeX or
| in groff). What is the best way to obtain:
| 
| ;;; Preamble
| (define my_macro1 ...)
| (define my_macro2 ...)
| (define my_macro3 ...)
| (define Paragraph ...)
| ;;; Document
| (Paragraph Hello, world! )
| (set! global-something 42)
| (Paragraph This is text and I don't know how to include it in the file. )
| 
| Of course it should be very easy for the user to type text.
| Thus I don't think it would be a good idea of using strings for a whole
| paragraph, because some authors have very long paragraphs, and I don't
| like the idea of a too long string (but I may be wrong?); besides the
| user would have to put "" everywhere. What would you do ?
+---------------

You should definitely take a look at Dorai Sitaram's "Mistie"
<URL:http://www.cs.rice.edu/~dorai/mistie/mistie.html> and grok it
thoroughly. Yes, I know it's not *exactly* what you're asking for,
but it can almost trivially be made to be. Hint: You'll probably
want to make #\\ [and probably #\( and #\) as well] be magic Mistie
formatting characters [see "mistie-def-char" & "mistie-def-ctl-seq"].

Or to say it another way, I strongly suggest you *not* try to blindly
mix S-exprs and users' plain text, but sit down and design an explicit
*language* (well, a explicit syntax at least) for your input files.
Then use a Mistie-style bottom-up parser (which is *very* easy to customize)
to implement the desired semantics.

For example, suppose you said that a backslash followed by a left-paren
started an S-expr that was to be eval'd in Scheme, whereas a backslash
before an identifier [something that starts with an alphabetic] invoked
a Scheme procedure [what you're calling a "macro"]. Mistie can easily
handle this. Then your input file might look like this:

	\(define-tag 'Paragraph ...)
	...
	\Paragraph
        This is text and there's no problem whatsoever including it,
	even the quotes such as "these", in the file.
	\end

Or you could use parens everywhere, but again, *NOT* try to read the
file with Scheme's "read" procedure, but use Mistie to read it, with
"(" as the main magic character [and probably also let "\" be magic
so you can escape parens with a "\(" or "\) sequence]:

	(define-tag 'Paragraph ...)
	...
	(Paragraph This is text and there's no problem whatsoever
	including it, even the quotes such as "these", in the file.)

In Mistie, assuming you've set up everything ahead of time with the
proper "style sheet" (Scheme program), what happens when the file is
read (character-by-character) is that first the "(" is seen, which calls
the handler for "(" installed by "mistie-def-char". That handler would
suck up a tag and look it up in a hash table (or a-list) and call the
procedure associated with that tag. That procedure will push some context
on a stack, and set future characters to be accumulated into a buffer,
and then set right-paren [with "mistie-def-ctl-seq"] to process the
accumulated characters and pop the stack. [That could even include,
for certain tags, passing the string to the Scheme reader and then
to "eval", if you wanted to.]

If you know Emacs internals, Mistie is sort of in the style of the
keyboard input mapping portion of Emacs. That is, each possible input
character is bound to a Scheme procedure that performs some action
when that character is read. Those actions can include temporarily
(or permanently) re-binding the character or *other* characters to
other actions.  So once you've seen "(paragraph", the ")" character
is bound to "finish-processing-the-buffered-chars-from-a-paragraph",
and when you hit the ")" and perform the action, it's rebound back to its
previous meaning (which was probably "finish-processing-something-else").
Very elegant, and quite powerful, yet simple to get it to do simple stuff.

I'm probably not explaining it very well, sorry. See the above URL
for a better explanation and examples...


-Rob

-----
Rob Warnock, 30-3-510		<rpw3@sgi.com>
SGI Network Engineering		<http://www.meer.net/~rpw3/>
1600 Amphitheatre Pkwy.		Phone: 650-933-1673
Mountain View, CA  94043	PP-ASEL-IA

[Note: aaanalyst@sgi.com and zedwatch@sgi.com aren't for humans ]