drewc <drewc@rift.com> wrote:
+---------------
| Rob Warnock wrote:
| > (defun file-string (path)
| > "Sucks up an entire file from PATH into a freshly-allocated string,
| > returning two values: the string and the number of bytes read."
| > (with-open-file (s path)
| > (let* ((len (file-length s))
| > (data (make-string len)))
| > (values data (read-sequence data s)))))
|
| According to [ <http://www.tfeb.org/lisp/obscurities.html> ] ...
+---------------
Thanks for the ref!
+---------------
| ...this function is not portable :
|
| "But this almost certainly will not work reliably. file-length will
| almost certainly tell you the length of the file in octets, not
| characters...
+---------------
Hmmm... O.k., I'll agree with the non-portability in general, but
it *might* be slightly more portable than Tim's page suggests. ;-}
According to the CLHS:
FILE-LENGTH returns the length of stream, or NIL if the length
cannot be determined.
For a binary file, the length is measured in units of the
element type of the stream.
and refers one to OPEN, which says:
element-type---a type specifier for recognizable subtype of
CHARACTER; or a type specifier for a finite recognizable subtype
of INTEGER; or one of the symbols SIGNED-BYTE, UNSIGNED-BYTE, or
:DEFAULT. The default is CHARACTER.
And 13.1.4.1 "Graphic Characters" says that:
#\Backspace, #\Tab, #\Rubout, #\Linefeed, #\Return, and #\Page,
if they are supported by the implementation, are non-graphic.
But 2.1.3 "Standard Characters" only requires that the non-graphic
characters #\Space and #\Newline is supported.
So I guess it really boils down to whether in a given implementation
#\Return exists as a CHARACTER, and what happens when you READ-CHAR
a stream containing one, since READ-SEQUENCE is defined that way:
READ-SEQUENCE is identical in effect to iterating over the
indicated subsequence and reading one element at a time from
stream and storing it into sequence, but may be more efficient
than the equivalent loop. An efficient implementation is more
likely to exist for the case where the sequence is a vector with
the same element type as the stream.
Note that this is *not* the same as asking whether:
(= (length (file-string "foo"))
(with-open-file (s "foo")
(loop for line = (read-line s nil nil)
while line
sum (1+ (length line)))))
==> T
This clearly might be false on platforms where #\Newline is externally
represented as <CR><LF>, but if #\Return is a (non-graphic) CHARACTER
on those machines, then the following might still be true even if the
above is false:
(= (length (file-string "foo"))
(with-open-file (s "foo")
(loop for char = (read-char s nil nil)
while char
count t)))
Note that the former returns NIL on CMUCL under Unix when given a file
containing ASCII NULs (a .tar.gz! ;-} ) but the latter still returns T.
It would be interesting to know whether the latter also returns T on
MS/DOS or Windows platforms, and for which CL implemetations.
-Rob
-----
Rob Warnock <rpw3@rpw3.org>
627 26th Avenue <URL:http://rpw3.org/>
San Mateo, CA 94403 (650)572-2607