Subject: Re: Java discovers map
From: rpw3@rpw3.org (Rob Warnock)
Date: Sun, 05 Aug 2007 00:56:08 -0500
Newsgroups: comp.lang.lisp
Message-ID: <nrmdnScHN97l-yjbnZ2dneKdnZydnZ2d@speakeasy.net>
Steven E. Harris <seh@panix.com> wrote:
+---------------
| Alan Crowe <alan@cawtech.freeserve.co.uk> writes:
| > 2)External iterator
| >   There is a setup function that creates an object with
| >   exactly two methods, one that tests it for empty, and one
| >   that pops the next item.
| 
| The Standard C++ Library's external iterators form a larger family than
| the "Forward Traversal Iterator" you describe above. Actually, per the
| descriptions here�, your description maps to a "Readable Single Pass"
| iterator, but note that many other combinations are possible.
| 
| Different algorithms demand different characteristics of iterators upon
| which they operate.
+---------------

Indeed. See below.

+---------------
| Footnotes: 
|   http://www.boost.org/libs/iterator/doc/new-iter-concepts.html#design
+---------------

Interesting, in that they failed to mention one of the types of
iterators that I've found most useful, which one might call
"Resettable/Multi-Traversal" [though perhaps there is a more
traditional or standard name for it?], which provides:

  - An initial setup function [in CLOS terms, an :AFTER method
    on INITIALIZE-INSTANCE].
  - An empty test [as with Alan Crowe's list].
  - A get-next function [as with Alan Crowe's list, except...].
  - A reset function that restores the state of the object *AND ALL
    OF ITS ITERATOR CHILDREN* to the state which existed immediately
    after the execution of the "initial setup" function.

This is useful for a style of tree-oriented generators which,
in addition to providing traditional depth-first walks of the
(abstract) leaves [I say "abstract leaves", since the node just
above a collection of "leaves" might really be a generator/iterator
of the leaves which produce them on demand -- and indeed *any*
interior node of the tree may be such a generator], also provides
the ability to walk some subtrees "in parallel", deterministically,
with error detection if the parallel walks don't generate the
same number of leaves.

REAL-LIFE EXAMPLE: "WIRLEX"

I've been carrying around a program -- or perhaps I should say
a "pattern" or a "meme", since I tend to re-write it from scratch
every time I learns a new language [or go to work for a new
employer!] -- which illustrates the need for both "resettable"
iterators and also the "walk some subtrees in parallel" requirement.
I call it "wirlex", which originally meant "a WIRe-wrapping netlist
LEXical pre-processor", which made it easier to create input
netlists of signal/pin pairs for a separate wire-listing program
[which sorted by signal name then generated a quasi-optimal route
and sequence to wrap the wires onto a backplane].

I wrote the original version was written in 1971 in "FASTBOL",
a compiled SNOBOL-4 from The Stevens Institute. This version
used the "expand it all in memory first, then print" approach,
and was used to generate wirelists for the DCA Smart/MUX product
line as well as several hardware design projects at Emory University.
John C. Alderman (DCA) re-wrote it in the "8BAL" macro preprocessor
for the DEC PDP-8 in 1972; that version (and all subsequent ones
that I know of) output the expansion lines as they were calculated,
due to the small memory of the PDP-8. It also ran more than an
order of magnitude faster than the FASTBOL version. Since then,
Bakul Shah wrote a version in C in 1981 at Fortune Systems, which
added the extremely-useful "cross-product" syntax [see below],
and I've since coded versions with that syntax in C [1985],
Scheme [~2000], and CL/CLOS [August 2007 -- actually, currently
in progress as direct a result of this very thread!! ;-} ].
I've also found that discussing the ideas needed for for this
application makes a great interview question for software engineers,
especially when you forbid the "all in memory" approach!!  ;-}

The basic meme is just a *tiny* extension of something almost
all of us use every day: the "{string1,string2,string3}"-expander
syntax in most Unix/Linux shells that "generate" multiple
argument strings from a single typed argument, e.g.:

    $ echo foo{1,2,3}bar
    foo1bar foo2bar foo3bar
    $ 

[Note: This is distinctly different from file-matching pattern
operations such as "[...]", "*", and "?", which don't generate any
expansion unless there is a matching filename in the filesystem.]

Shells, however, perform the expansion "word" at a time, so that
instances of "{...}" on the same line are unrelated, e.g.:

    $ echo foo{1,2,3}bar baz{5,6,7,8}gorp
    foo1bar foo2bar foo3bar baz5gorp baz6gorp baz7gorp baz8gorp
    $ 

The "wirlex" syntax [approximate grammar attached below], however,
expands such expressions *in parallel*, and also insists that the
parallel expansions produce the same number of items, e.g.:

    $ echo 'foo{1,2,3}bar baz{5,6,7}gorp' | ./wirlex -
    foo1bar baz5gorp
    foo2bar baz6gorp
    foo3bar baz7gorp
    $ echo 'foo{1,2,3}bar baz{5,6,7,8}gorp' | ./wirlex -
    foo1bar baz5gorp
    foo2bar baz6gorp
    foo3bar baz7gorp
    wirlex: Line 1, items in {} {} don't match
    $ 

Moreover, "wirlex" permits iterator element lists to contain
either ascending or descending colon-delimited "ranges" of
either integers or letters, e.g.:

    $  echo '{a:i}-{1:9}-{Omega,Z:X,others...,C:A,Alpha}' | ./wirlex -
    a-1-Omega
    b-2-Z
    c-3-Y
    d-4-X
    e-5-others...
    f-6-C
    g-7-B
    h-8-A
    i-9-Alpha
    $ 

It also allows iterators to be specified with angle brackets,
so the input "<a:i>-<1:9>-<Omega,Z:X,others...,C:A,Alpha>"
would generate exactly the same output as the above.

But the really fun part is that if you *mix* "<>" & "{}" on the
same input line, you get the Cartesian cross-product of the "<>"
expansions with the "{}" expansions, with the "{}" expansions
iterating "fastest", e.g.:

    $ echo '<1:3>.{1:4} {foo,bar,baz,gorp} said <hello,there,world>' \
       | ./wirlex -
    1.1 foo said hello
    1.2 bar said hello
    1.3 baz said hello
    1.4 gorp said hello
    2.1 foo said there
    2.2 bar said there
    2.3 baz said there
    2.4 gorp said there
    3.1 foo said world
    3.2 bar said world
    3.3 baz said world
    3.4 gorp said world
    $ 

Note that all the "{}" iterate together, and all the "<>" iterate
together, no matter where on the line they are.

And of course any list element can contain sub-elements which can
be ranges or lists, including more cross-products, e.g.:

    $ echo '<1:9> <{green,red,yellow} <apples,houses,socks>>' | ./wirlex -
    1 green apples
    2 red apples
    3 yellow apples
    4 green houses
    5 red houses
    6 yellow houses
    7 green socks
    8 red socks
    9 yellow socks
    $ 

A more complex example, more typical of what "wirlex" was actually
used for, is in a  postscript [below].

Sorry to have nattered on for so long, but this hopefully illustrates
the need for "resettable" iterators, specifically, when one or more
"<>" and one or more "{}" are generating a cross-product, when the
"{}"s come to an end but the "<>"s still have more to generate, the
entire set of "{}" AT THE SAME LEVEL (but not elsewhere above) must
be reset to their initial states, *recursively* all the way down
[which also requires resetting any embedded "<>" that are in
cross-products].

Is there any language whose standard library provides
such resettable iterators?

[And is there a more commonly-known name for them than
"resettable iterators"?]


-Rob

p.s. Here's an approximate grammar for "wirlex" input (primitive
tokens in caps), which doesn't show the constraint on parallel
expansion lengths matching:

        goal = cat

        cat = cat_elem [ cat ]

        cat_elem = angle | curly | kstring | EMPTY

        kstring = <a sequence of characters not containing "{}<>,:" >

        angle = "<" list ">"

        curly = "{" list "}"

        list = list_elem [ "," list ]

        list_elem = cat | rep

        rep = LETTER ":" LETTER  |  NUMBER ":" NUMBER

p.p.s. Here's a somewhat longer example, typical of its actual use
in generating wire-wrapping netlists. It may be helpful to know that
in circuit design individual chips are often designated "U<number>"
or (archaic) "E<number>", with an individual pin on the chip being
the chip designator, a dash, and a ping number, e.g., "U31-17".
In the following example, there are four chips with nine outputs each
[as might have been the case with a 22-pin MSI package circa 1980,
say, a 9-bit "D" flip-flop register with 9 inputs, 9 outputs, power,
ground, a clock, and a tri-state output enable], driving a 32-bit
data bus with byte-parity [so there are four parity bits, P<3:0>].
Notice that the last chip is used for both five data bits and all
four parity bits [probably not such a good idea, actually -- one
would want the parity bits in different chips than the data bits
they protect]:

    $ echo 'BUS DATA <<31:0>,P<3:0>> = U<<23,17,09,12>-{18,3:7,13:11}>' \
       | ./wirlex -
    BUS DATA 31 = U23-18
    BUS DATA 30 = U23-3
    BUS DATA 29 = U23-4
    BUS DATA 28 = U23-5
    BUS DATA 27 = U23-6
    BUS DATA 26 = U23-7
    BUS DATA 25 = U23-13
    BUS DATA 24 = U23-12
    BUS DATA 23 = U23-11
    BUS DATA 22 = U17-18
    BUS DATA 21 = U17-3
    BUS DATA 20 = U17-4
    BUS DATA 19 = U17-5
    BUS DATA 18 = U17-6
    BUS DATA 17 = U17-7
    BUS DATA 16 = U17-13
    BUS DATA 15 = U17-12
    BUS DATA 14 = U17-11
    BUS DATA 13 = U09-18
    BUS DATA 12 = U09-3
    BUS DATA 11 = U09-4
    BUS DATA 10 = U09-5
    BUS DATA 9 = U09-6
    BUS DATA 8 = U09-7
    BUS DATA 7 = U09-13
    BUS DATA 6 = U09-12
    BUS DATA 5 = U09-11
    BUS DATA 4 = U12-18
    BUS DATA 3 = U12-3
    BUS DATA 2 = U12-4
    BUS DATA 1 = U12-5
    BUS DATA 0 = U12-6
    BUS DATA P3 = U12-7
    BUS DATA P2 = U12-13
    BUS DATA P1 = U12-12
    BUS DATA P0 = U12-11
    $ 

p.p.s. If one wrote a parser for "wirlex" syntax in CL (which I'm
in the middle of doing), one reasonable intermediate form of the
above "BUS DATA" pattern might be this s-expr:

    ;;; BUS DATA <<31:0>,P<3:0>> = U<<23,17,09,12>-{18,3:7,13:11}>
    (concat (konst "BUS DATA ")
	    (angle (concat (range 31 0))
		   (concat (konst "P")
			   (angle (range 3 0))))
	    (konst " = U")
	    (angle (concat (angle (concat (konst "23")
					  (konst "17")
					  (konst "09")
					  (konst "12")))
	                   (konst "-")cat (konst "23")
			   (curly (concat (konst "18"))
				  (range 3 7)
				  (range 13 11)))))

Ob. disclosure: I confess that I didn't bother to figure out the
above s-expr manually; I just transliterated it from the debug output
of my C version when run with "wirlex -t", which prints the parse tree
for each input line!  ;-}

    $ echo 'BUS DATA <<31:0>,P<3:0>> = U<<23,17,09,12>-{18,3:7,13:11}>' \
       | ./wirlex -t -
       C -+- K: 'BUS DATA '
          |
          +- < -+- C --- < --- R: 31,0
          |     |
          |     +- C -+- K: 'P'
          |           |
          |           +- < --- R: 3,0
          |
          +- K: ' = U'
          |
          +- < --- C -+- < -+- C --- K: '23'
                      |     |
                      |     +- C --- K: '17'
                      |     |
                      |     +- C --- K: '09'
                      |     |
                      |     +- C --- K: '12'
                      |
                      +- K: '-'
                      |
                      +- { -+- C --- K: '18'
                            |
                            +- R: 3,7
                            |
                            +- R: 13,11
    $ 

-----
Rob Warnock			<rpw3@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607