*Oops!* Some hours ago, I wrote:
+---------------
| One approach that *might* buy a little performance is to (after
| copying the initial "word" *once*, if needed for immutability)
| make the prefixes be 1-, 2-, and 3-element arrays displaced to
| the "word". Though note that with *most* implementations'
| representations of displaced arrays, consing up a simple 1-, 2-,
| or 3-char string with SUBSEQ is likely to be *far* less expensive
| than consing up a displaced arrays. (*sigh*)
+---------------
Looking back, I see I had misunderstood the problem: the *minimum*
abbreviation is 3 characters, right? In that case, modify what I said
above to be this:
One approach that *might* buy a little performance is to (after
copying the initial "word" *once*, if needed for immutability)
create prefixes above a certain size [which will depend on the
CL implementation you're using] by using arrays displaced to
"word", e.g.:
> (let ((word (copy-seq "abcdefghijklmnopqrstuvwxyz"))
(copy-vs-displace-threshold 10))
(loop for i from 3 to (1- (length word)) collect
(if (> i copy-vs-displace-threshold)
(make-array i :element-type 'base-char :displaced-to word)
(subseq word 0 i))))
("abc" "abcd" "abcde" "abcdef" "abcdefg" "abcdefgh" "abcdefghi"
"abcdefghij" "abcdefghijk" "abcdefghijkl" "abcdefghijklm"
"abcdefghijklmn" "abcdefghijklmno" "abcdefghijklmnop"
"abcdefghijklmnopq" "abcdefghijklmnopqr" "abcdefghijklmnopqrs"
"abcdefghijklmnopqrst" "abcdefghijklmnopqrstu"
"abcdefghijklmnopqrstuv" "abcdefghijklmnopqrstuvw"
"abcdefghijklmnopqrstuvwx" "abcdefghijklmnopqrstuvwxy")
> (mapcar 'type-of *)
((SIMPLE-BASE-STRING 3) (SIMPLE-BASE-STRING 4) (SIMPLE-BASE-STRING 5)
(SIMPLE-BASE-STRING 6) (SIMPLE-BASE-STRING 7) (SIMPLE-BASE-STRING 8)
(SIMPLE-BASE-STRING 9) (SIMPLE-BASE-STRING 10) (BASE-STRING 11)
(BASE-STRING 12) (BASE-STRING 13) (BASE-STRING 14) (BASE-STRING 15)
(BASE-STRING 16) (BASE-STRING 17) (BASE-STRING 18) (BASE-STRING 19)
(BASE-STRING 20) (BASE-STRING 21) (BASE-STRING 22) (BASE-STRING 23)
(BASE-STRING 24) (BASE-STRING 25))
> (mapcar 'array-displacement **)
(NIL NIL NIL NIL NIL NIL NIL NIL "abcdefghijklmnopqrstuvwxyz"
"abcdefghijklmnopqrstuvwxyz" "abcdefghijklmnopqrstuvwxyz"
"abcdefghijklmnopqrstuvwxyz" "abcdefghijklmnopqrstuvwxyz"
"abcdefghijklmnopqrstuvwxyz" "abcdefghijklmnopqrstuvwxyz"
"abcdefghijklmnopqrstuvwxyz" "abcdefghijklmnopqrstuvwxyz"
"abcdefghijklmnopqrstuvwxyz" "abcdefghijklmnopqrstuvwxyz"
"abcdefghijklmnopqrstuvwxyz" "abcdefghijklmnopqrstuvwxyz"
"abcdefghijklmnopqrstuvwxyz" "abcdefghijklmnopqrstuvwxyz")
> (mapcar (lambda (x) (eq x (nth 8 *))) *)
(NIL NIL NIL NIL NIL NIL NIL NIL T T T T T T T T T T T T T T T)
>
If you set COPY-VS-DISPLACE-THRESHOLD more-or-less correctly,
this will use much less space for *large* words than using
SUBSEQ for all the prefixes.
-Rob
p.s. Hmmm... Doing a little peeking under the covers of CMUCL,
it seems like an appropriate value for COPY-VS-DISPLACE-THRESHOLD
there would be 21-24 characters. For lengths smaller than 21, a
SIMPLE-STRING will take less space that a displaced-array header.
Other implementations will vary, of course...
-----
Rob Warnock <rpw3@rpw3.org>
627 26th Avenue <URL:http://rpw3.org/>
San Mateo, CA 94403 (650)572-2607