Subject: Re: Strings and lists
From: Erik Naggum <erik@naggum.no>
Date: 1999/06/20
Newsgroups: comp.lang.lisp
Message-ID: <3138871289438905@naggum.no>

* Josef Eschgfaeller <esg@felix.unife.it>
| I'm trying to improve
|     
|     (defun first-word (a)
|     (let ((pos (position #\space a))) (if (not pos) a
|     (subseq a 0 pos))))

  it is hard to read your code the way you (don't) indent it.

(defun first-word (a)
  (let ((pos (position #\space a)))
    (if (not pos)
      a
      (subseq a 0 pos))))

| In such algorithms for strings it is often not clear to me in which cases
| it is convenient to introduce lists and when it has sense to mix strings
| with lists.

  you would have to have extraordinary needs for lists to make sense, but
  instead of writing code for strings, write your code for sequences --
  that way, you won't have to decide on types until you have a particular
  need.  note that POSITION, for instance, works on sequences, not only
  strings.  that is, (position #\Space (coerce "foo bar zot" 'list)) works
  just as well as (position #\Space "foo bar zot").  likewise for SUBSEQ,
  which is short for SUBSEQUENCE.  many new Lisp programmers are thrown by
  the fact that most usual string operations aren't string operations in
  Lisp, but sequence operations, like SEARCH, POSITION, and SUBSEQ.

  speaking generally, I think you should consider a less specific function,
  since hacking up sequences is a very common task, and you also often need
  to know whether there are further elements in the sequence you hack up.
  in other words, return both halves.

(defun get-token (sequence delimiter &key (key #'identity) (test #'eql))
  "Return the non-empty subsequences of SEQUENCES before and after the first
occurrence of the DELIMITER as two values, or NIL if it would be empty."
  (let* ((start (or (position delimiter sequence
 			      :key key
			      :test (complement test))
		    (return-from get-token nil)))
	 (end (or (position delimiter sequence
			    :key key
			    :test test
			    :start start)
		  (return-from get-token (subseq sequence start))))
	 (next (or (position delimiter sequence
			     :key key
			     :test (complement test)
			     :start (1+ end))
		   (return-from get-token (subseq sequence start)))))
    (values (subseq sequence start end) (subseq sequence next))))

  when hacking up fairly large strings, it makes sense to be cons-sensitive
  and replace the last form with

    (values (subseq sequence start end)
	    (if (vectorp sequence)
	      (multiple-value-bind (base offset)
		  (array-displacement sequence)
		(if base
		  (adjust-array sequence (- (length sequence) next)
				:displaced-to base
				:displaced-index-offset (+ next offset))
		  (make-array (- (length sequence) next)
		    :element-type (array-element-type sequence)
		    :displaced-to sequence
		    :displaced-index-offset next)))
	      (subseq sequence next)))

  which will not cause the tail of the string to needlessly copied.

  the above is somewhat expanded from production code in one of my systems.
  I use MAKE-INDIRECT-VECTOR and MOVE-INDIRECT-VECTOR to manipulate
  displaced strings, so the consequent of the final conditional is actually
  only (move-indirect-vector sequence :start next).  (this is why there are
  no :START and :END keywords.)

#:Erik
-- 
@1999-07-22T00:37:33Z -- pi billion seconds since the turn of the century