Joe Marshall <jrm@ccs.neu.edu> wrote:
+---------------
| As it turned out, a list of strings wasn't the best representation
| anyway. A single string was better. Since this was a bottleneck, it
| was worthwhile to do a serious hack. Using the FFI, I called mmap to
| map the file into the address space, then I faked up an array header
| so it looked like a lisp object and returned a displaced array to it.
| (The GC was ok with this, but it really depended on the details of how
| the GC was coded.) This was six orders of magnitude faster than the
| original program.
+---------------
In some implementations [e.g., CMUCL], READ-SEQUENCE will call the
underlying operating system's "read()" routine more-or-less directly,
which is why I use the following for this kind of thing:
(defun file-string (path)
"Sucks up an entire file from PATH into a freshly-allocated string,
returning two values: the string and the number of bytes read."
(with-open-file (s path)
(let* ((len (file-length s))
(data (make-string len)))
(values data (read-sequence data s)))))
It's... uh... FAST:
> (defvar *data* nil) ; avoid printing many megabytes
*DATA*
> (time (multiple-value-bind (data len)
(file-string "MAIL.today")
(setf *data* data)
len))
; Compiling LAMBDA NIL:
; Compiling Top-Level Form:
; Evaluation took:
; 0.05 seconds of real time
; 0.00165 seconds of user run time
; 0.047575 seconds of system run time
; 101,130,862 CPU cycles
; 0 page faults and
; 11,638,960 bytes consed.
;
11637220
> (subseq *data* 0 100) ; prove that it worked
"From root@CENSORED.org Sun Jan 2 01:02:57 2005
Return-Path: <root@CENSORED.org>
X-Original-To: lis"
>
-Rob
-----
Rob Warnock <rpw3@rpw3.org>
627 26th Avenue <URL:http://rpw3.org/>
San Mateo, CA 94403 (650)572-2607