Subject: Re: Scheme implementation in C++
From: rpw3@rigden.engr.sgi.com (Rob Warnock)
Date: 2000/10/20
Newsgroups: comp.lang.scheme
Message-ID: <8sp8fq$93otl$1@fido.engr.sgi.com>
felix <felix@anu.ie> wrote:
+---------------
| Rob Warnock wrote:
| >+---------------
| >| I still might want to replace stdio with iostream, and c strings
| >| with std:strings.
| >+---------------
| >
| >Remember that Scheme strings are "counted", not terminated by nulls:
| >
| > > (define str (apply string (map integer->char '(97 98 0 99 100))))
| > > str
| > "abcd"
| > > (string-length str)
| > 5
| > > (string->list str)
| > (#\a #\b #\nul #\c #\d)
| > >
| 
| What prevents you from using the first byte(s) for a count?
+---------------

What do you mean "the first byte(s)"? In Scheme, *all*
of the bytes of a string between (string-ref str 0) and
(string-ref str (- (string-length str) 1)) are significant
user data. The length is *already* "hidden" in the representation
of a string object. (That's what I meant by "counted strings".)
[If you're talking about, say, using bytes *before* the start
of the C-string for a count, then sure, why not. That's just
a representation issue. It doesn't affect the user semantics.]

My point was that Scheme strings, unlike C strings, *allow*
ASCII "NUL" characters (C '\0' characters) in the string, and
thus there is no way in general to export full Scheme strings
as C strings. Likewise, many C tricks don't do the expected
thing to Scheme strings. For example, how many times have you
seen this standard C idiom [a subset of "strtok()"]? Lots, right?

	char *s = strdup(some_string);
	char *p = strchr(s, ':');

	if (p)
	  *p = '\0';	/* trim the ':' and everything after it */

But the corresponding code in Scheme [assuming SRFI-13 "string-index"]
*doesn't* do what a C programmer would expect:

	> (define s (string-copy some-string))
	> s
	"hello:there"
	> (string-length s)
	11
	> (define i (string-index s #\:))
	> i
	5
	> (string-set! s i #\nul)
	> s
	"hellothere"
	> (string-length s)
	11
	> (string->list s)
	(#\h #\e #\l #\l #\o #\nul #\t #\h #\e #\r #\e)
	> 

Note that stuffing a null character into the middle of the string
*didn't* shorten the string!!  So even if you try to be "C-friendly"
(say, by adding a hidden null byte off the end of each Scheme string),
if you "export" a Scheme string to some C code and it pulls some stunt
like the above, it *won't* work as expected (by a C programmer, at least).
Instead, you have to do something like this:

	> (define s "hello:there")
	> (string-length s)
	11
	> (define i (string-index s #\:))
	> i
	5
	> (set! s (if i (substring s 0 i) s))
	> s
	"hello"
	> (string-length s)
	5
	>

And doing that in C is *much* messier than the idiomatic hack above:

	char *s;
	char *p = strchr(some_string, ':');

	if (p) {
	    int i = p - some_string;

	    s = (char *)malloc(i + 1);
	    strncpy(s, some_string, i);
	    s[i] = '\0';
	} else
	    s = strdup(some_string);


-Rob

p.s. If you don't have SRFI-13 handy, here's a overly-simplified
version of "string-index" (just enough for the above examples):

	(define (string-index str ch)
	  (let ((end (string-length str)))
	    (let loop ((i 0))
	      (and (< i end)
		   (if (char=? ch (string-ref str i))
		     i
		     (loop (+ i 1)))))))

-----
Rob Warnock, 31-2-510		rpw3@sgi.com
Network Engineering		http://reality.sgi.com/rpw3/
Silicon Graphics, Inc.		Phone: 650-933-1673
1600 Amphitheatre Pkwy.		PP-ASEL-IA
Mountain View, CA  94043