Subject: Re: efficiently accumulating values
From: rpw3@rpw3.org (Rob Warnock)
Date: Sun, 09 Jul 2006 06:27:47 -0500
Newsgroups: comp.lang.lisp
Message-ID: <5tqdnaadANOudS3ZnZ2dnUVZ_tadnZ2d@speakeasy.net>
*Oops!* Some hours ago, I wrote:
+---------------
| One approach that *might* buy a little performance is to (after
| copying the initial "word" *once*, if needed for immutability)
| make the prefixes be 1-, 2-, and 3-element arrays displaced to
| the "word". Though note that with *most* implementations'
| representations of displaced arrays, consing up a simple 1-, 2-,
| or 3-char string with SUBSEQ is likely to be *far* less expensive
| than consing up a displaced arrays. (*sigh*)
+---------------

Looking back, I see I had misunderstood the problem: the *minimum*
abbreviation is 3 characters, right? In that case, modify what I said
above to be this:

One approach that *might* buy a little performance is to (after
copying the initial "word" *once*, if needed for immutability)
create prefixes above a certain size [which will depend on the
CL implementation you're using] by using arrays displaced to
"word", e.g.:

    > (let ((word (copy-seq "abcdefghijklmnopqrstuvwxyz"))
	    (copy-vs-displace-threshold 10))
	(loop for i from 3 to (1- (length word)) collect
	  (if (> i copy-vs-displace-threshold)
	    (make-array i :element-type 'base-char :displaced-to word)
	    (subseq word 0 i))))

    ("abc" "abcd" "abcde" "abcdef" "abcdefg" "abcdefgh" "abcdefghi"
     "abcdefghij" "abcdefghijk" "abcdefghijkl" "abcdefghijklm"
     "abcdefghijklmn" "abcdefghijklmno" "abcdefghijklmnop"
     "abcdefghijklmnopq" "abcdefghijklmnopqr" "abcdefghijklmnopqrs"
     "abcdefghijklmnopqrst" "abcdefghijklmnopqrstu"
     "abcdefghijklmnopqrstuv" "abcdefghijklmnopqrstuvw"
     "abcdefghijklmnopqrstuvwx" "abcdefghijklmnopqrstuvwxy")
    > (mapcar 'type-of *)

    ((SIMPLE-BASE-STRING 3) (SIMPLE-BASE-STRING 4) (SIMPLE-BASE-STRING 5)
     (SIMPLE-BASE-STRING 6) (SIMPLE-BASE-STRING 7) (SIMPLE-BASE-STRING 8)
     (SIMPLE-BASE-STRING 9) (SIMPLE-BASE-STRING 10) (BASE-STRING 11)
     (BASE-STRING 12) (BASE-STRING 13) (BASE-STRING 14) (BASE-STRING 15)
     (BASE-STRING 16) (BASE-STRING 17) (BASE-STRING 18) (BASE-STRING 19)
     (BASE-STRING 20) (BASE-STRING 21) (BASE-STRING 22) (BASE-STRING 23)
     (BASE-STRING 24) (BASE-STRING 25))
    > (mapcar 'array-displacement **)

    (NIL NIL NIL NIL NIL NIL NIL NIL "abcdefghijklmnopqrstuvwxyz"
     "abcdefghijklmnopqrstuvwxyz" "abcdefghijklmnopqrstuvwxyz"
     "abcdefghijklmnopqrstuvwxyz" "abcdefghijklmnopqrstuvwxyz"
     "abcdefghijklmnopqrstuvwxyz" "abcdefghijklmnopqrstuvwxyz"
     "abcdefghijklmnopqrstuvwxyz" "abcdefghijklmnopqrstuvwxyz"
     "abcdefghijklmnopqrstuvwxyz" "abcdefghijklmnopqrstuvwxyz"
     "abcdefghijklmnopqrstuvwxyz" "abcdefghijklmnopqrstuvwxyz"
     "abcdefghijklmnopqrstuvwxyz" "abcdefghijklmnopqrstuvwxyz")
    > (mapcar (lambda (x) (eq x (nth 8 *))) *)

    (NIL NIL NIL NIL NIL NIL NIL NIL T T T T T T T T T T T T T T T)
    > 

If you set COPY-VS-DISPLACE-THRESHOLD more-or-less correctly,
this will use much less space for *large* words than using
SUBSEQ for all the prefixes.


-Rob

p.s. Hmmm... Doing a little peeking under the covers of CMUCL,
it seems like an appropriate value for COPY-VS-DISPLACE-THRESHOLD
there would be 21-24 characters. For lengths smaller than 21, a
SIMPLE-STRING will take less space that a displaced-array header.
Other implementations will vary, of course...

-----
Rob Warnock			<rpw3@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607