Subject: Re: read-byte and *standard-input*
From: rpw3@rpw3.org (Rob Warnock)
Date: Fri, 02 May 2003 04:32:26 -0500
Newsgroups: comp.lang.lisp
Message-ID: <WKycnbrGnNy3pi-jXTWc-g@speakeasy.net>
Adam Warner <usenet@consulting.net.nz> wrote:
+---------------
| Hi Thibault Langlois,
| > The program should read binary data from standard input but read-byte
| > gives an error if the stream is not open with :element-type set to
| > unsigned-byte.
| 
| First you should always tell people your implementation. If it's CLISP the
| developers made a decision that one is not allowed to read binary data from
| STDIN.
+---------------

Not just CLISP, CMUCL complains as well:

	> (read-byte *standard-input*)

	#<Stream for Standard Input> is not a binary input stream.
	Restarts:
	  0: [ABORT] Return to Top-Level.

+---------------
| I think the decision is unfortunate and it affects the efficiency
| of CLISP for CGI programming. The workaround is to read the data using
| a faithfully reproducing character set with Unix end-of-lines. This will
| result in a character stream that is byte identical to the binary stream.
| ISO-8859-1 is a good choice for the faithfully reproducing character set.
| 
| Details of how to get this working are set out in the CL Cookbook:
| <http://cl-cookbook.sourceforge.net/io.html>
+---------------

Well, maybe, though AFAICT that URL says nothing about "faithful input";
it only talks about "faithful output".

But Adam's main point is correct: You need to specify your implementation.
For example, in CMUCL-18e there are (at least) two ways to work around this:

1. Use the CMUCL-specific function SYSTEM:READ-N-BYTES with *STANDARD-INPUT*
   as the stream argument [lines tagged "T:" are typed input]:

	> (defvar *buf* (make-array 10 :element-type '(unsigned-byte 8)))
	*BUF*
	> *buf*
	#(0 0 0 0 0 0 0 0 0 0)
	> (system:read-n-bytes *standard-input* *buf* 0 6 nil)
   T:	hello!
	6
	> *buf*
	#(104 101 108 108 111 33 0 0 0 0)
	>

2. Use the CMUCL-specific function SYSTEM:MAKE-FD-STREAM, since Unix
   standard input is always file descriptor #0:

	> (with-open-stream (s (system:make-fd-stream
				(unix:unix-dup 0)
				:element-type '(unsigned-byte 8)))
	    (loop for i = (read-byte s)
                  collect i
                  until (= i 10)))
   T:	hello, there!
	(104 101 108 108 111 44 32 116 104 101 114 101 33 10)
	> 

Or some combination of both...


-Rob

p.s. Why did I use (UNIX:UNIX-DUP 0) as the fd instead of just 0?
Well... When you close an fd-stream [which WITH-OPEN-STREAM will do,
of course], CMUCL will also close the underlying Unix file descriptor,
which if it were the "real" fd #0 would cause subsequent reads to its
own *STANDARD-INPUT* to fail and then you'd be in a world of hurt.
[About the only way out at that point is to call (UNIX:UNIX-EXIT).]

-----
Rob Warnock, PP-ASEL-IA		<rpw3@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607