Subject: Re: checking for duplicate elements
From: rpw3@rpw3.org (Rob Warnock)
Date: Thu, 30 Apr 2009 22:28:54 -0500
Newsgroups: comp.lang.lisp
Message-ID: <TdWdnc9CgtLr8WfUnZ2dnUVZ_rCdnZ2d@speakeasy.net>
Oops! I just wrote:
+---------------
| Madhu  <enometh@meer.net> wrote:
| +---------------
| |     (let ((n (mismatch seq1 (remove-duplicates seq1))))
| ...
| | The use of mismatch idea was suggested by Anon.C.Lisper elsewhere in
| | this thread.
| +---------------
| 
| IIUIC, you don't need the overhead of full MISMATCH here, (NOT (EQUAL ...))
| should work just as well, e.g.:
| 
|     (let ((n (not (equal seq1 (remove-duplicates seq1)))))
|       ...)
+---------------

While EQUAL works fine for lists and strings [and bit vectors],
it *doesn't* work correctly for general arrays;

    > (let* ((seq1 "foobar")
	     (seq2 (map 'vector #'identity seq1)))
	(list (mismatch seq1 (remove-duplicates seq1))
	      (not (equal seq1 (remove-duplicates seq1)))
	      (mismatch seq2 (remove-duplicates seq2))
	      (not (equal seq2 (remove-duplicates seq2)))))

    (2 T 2 T)
    > (let* ((seq1 "fubar")
	     (seq2 (map 'vector #'identity seq1)))
	(list (mismatch seq1 (remove-duplicates seq1))
	      (not (equal seq1 (remove-duplicates seq1)))
	      (mismatch seq2 (remove-duplicates seq2))
	      (not (equal seq2 (remove-duplicates seq2)))))

    (NIL NIL NIL T)
    > 

Oops! But I think there's a fix. Ordinarily, you wouldn't want to use
EQUALP for this sort of thing, since that could get false-positives
on the duplicate detection [#\a va. #\a], but in this case I think
it's o.k., since the REMOVE-DUPLICATES is still using its default EQL
test, so using EQUALP should be safe here:

    > (let* ((seq1 "foObar")
	     (seq2 (map 'vector #'identity seq1)))
	(list (mismatch seq1 (remove-duplicates seq1))
	      (not (equalp seq1 (remove-duplicates seq1)))
	      (mismatch seq2 (remove-duplicates seq2))
	      (not (equalp seq2 (remove-duplicates seq2)))))

    (NIL NIL NIL NIL)
    > 

But you know, if you're going for speed instead of succinctness,
all you *really* need to check is the length:  ;-}

    (/= (length seq1) (length (remove-duplicates seq1)))


-Rob

-----
Rob Warnock			<rpw3@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607