Subject: Re: Summary: Thoughts on implementing Scheme in C
From: rpw3@rigden.engr.sgi.com (Rob Warnock)
Date: 10 Nov 2000 05:31:11 GMT
Newsgroups: comp.lang.scheme
Message-ID: <8ug16v$dv09j$1@fido.engr.sgi.com>
Matthias Blume  <see@my.sig> wrote:
+---------------
| ...I am going to summarize my thoughts on C as the implementation
| language for Scheme.
+---------------

After only a quick reading, IMHO it's a useful contribution. A few random
comments and/or questions (again, based on only a brief reading)...

+---------------
| Since most if not all of these things are well-known, anyone interested
| in finding out more, better, more detailed information on it should
| consult the scientific literature.
+---------------

The one article that *immediately* came to mind was David Gudeman's
"Representing Type Information in Dynamically Typed Languages"
<URL:ftp://ftp.cs.indiana.edu/pub/scheme-repository/doc/pubs/typeinfo.ps.gz>,
which covers a wealth of representational issues.

+---------------
| Ordinary linked closures (= SICP's "environment model") are not
| safe-for-space...
+---------------

This is not obvious to me -- could you say more?  Since the environment
of the current closure is abandoned when making a tail call, I fail to
see the necessity of space-unsafety in the simple linked model.

I certainly see that you *can* write space-unsafe tail-call code using
closures, e.g., write CPS code in Scheme and pass a "new" closure that
always captures some portion of the current environment (and thus captures
the *entire* environment, since it's linked), but that's equally true
of tail-call code that conses data onto a list that it passes explicitly.
Both are extending a data structure (albeit the environment is an implicit
one) per call. However, AFAICT not *all* tail-call sequences -- even
quasi-infinite ones (such as, say, finite-state machines implemented
as a tail-call per state transtition) -- need involve such growth.
So what am I missing?

+---------------
| 2.3. Tail-recursiveness
| Informally speaking, "proper tail-recursiveness" requires that a
| function call in tail position does not grow the stack.  An infinite
| recursion through tail calls that does not consume memory in any other
| way should run in constant space.
+---------------

Minor quibble: I suspect we Schemers (and Lispers) tend to confuse
other people when we say "proper tail-recursiveness", because those
not deeply familiar with the concept assume there's no "recursion"
going on in some cases when "proper tail-recursion" is indeed still
required (e.g., quasi-infinite sequences of freshly-created closures
which tail-call freshly-created closures which tail-call... yet with
no closure ever calling "itself", even indirectly).

Also, many people who are not Scheme implementers come away with the
false belief that optimizing a self-tail-call ("immediate" recursion)
into a simple loop [such as GCC does] is somehow "all you really need"
for proper tail-recursiveness. (It isn't, as you of course know.)

To completely side-step this confusion, I personally now prefer to use
only the term "proper tail-call optimization", which, as R5RS says in
sect. 3.5, "supports an unbounded number of active tail calls". [And I
think it's unfortunate that R5RS didn't shift to using the term "proper
tail-call optimization", since that's really what they're talking about.]

The relevance to your article is that getting self-tail-recursion right
in C (by most C people's notion of "right") isn't even *close* to being
good enough for full Scheme...

+---------------
| 2.5. Dynamic type system
| ... [since] arbitrary types can be passed in any variable and
| as argument to any function, there must be a uniform ("boxed")
| representation for all data.
+---------------

Minor quibbles:

1. Some people reserve the term "boxed" to imply "heap-allocated",
   so to avoid this ambiguity Gudeman refers to data being "wrapped"
   with the type information, which may or may not (depending on the
   type) also require heap space (a "box").

2. While your statement is of course true, some readers may not
   immediately understand that in a "uniform representation" not all
   types need be wrapped in the same way -- decoding the type can be
   based on a possibly-complex decision tree. Only the entire representation
   *algorithm* need be "uniform" within a given implementation. Again,
   Gudeman's paper is a great reference for the tradeoffs between "flat"
   and "multilevel-decoded" wrappings.

+---------------
| For certain simple values such as small integers, one most certainly
| (for efficiency reasons) would want to represent them as immediate
| values and not heap-allocate boxes for them.
+---------------

Aha! Here you yourself clearly separate "wrapping" from "boxing".
(So I probably *wasn't* being too picky...?)

+---------------
| However, to be able to tell that they are values and not pointers
| to heap objects, there must be some kind of type tagging _within_
| the value's immediate representation.  There are various bit-fiddling
| tricks that one can play to achieve this effect; all these tricks
| depend in some way on the underlying machine architecture.
+---------------

While this is probably true, there *are*, I believe, "portable" (in
the sense of your initial requirement) ways in ANSI or ISO C to access
the required machine-dependencies in the local environment. In some
cases this may require defining the generic "Scheme" data type to be
a union and/or a structure (or a union of structures, etc.).

I believe it *is* a C requirement that "malloc()" return a block of
storage which is aligned to the most stringent requirements of any
basic C type on the underlying machine architecture, and, on current
machines at least, almost always requires at least a few low-order
bits of the addresses of all malloc'd objects will be zeroes.

One such bit is enough to distinguish (say) fixnums from boxed/heap;
two is enough for fixnums, other-immediate (#t, #f, chars, etc.),
boxed, and "broken heart" (for copying GCs); three [a common case]
is a *wealth* of riches!! [CMUCL uses 3 bits, doesn't it?]

+---------------
| 2.6.1. Efficient small integers
+------- [lots of good stuff elided] --------

One cute technique you didn't mention is the one used by SIOD Scheme,
which boxes *everything* ("everything's a pointer"), but which quite
nicely optimizes small integers by preallocating a bunch of them at
startup (configurable, but the range -100 to +1000 is not unreasonable),
and having all arithmetic operators check their results against that range,
returning a preallocated one where possible instead of consing up a new one.

This can also *significantly* speed up checking for known small values
by using an "eq?" (pointer) test for specific values instead of unwrapping
(or in this case, unboxing) the value and comparing with native arithmetic.

+---------------
| In a language where the compiler knows at compile-time that a number
| is a small integer, the typical operation requires just one single
| machine instruction.
+---------------

If SIOD-style small-integer preallocation is done at compile time,
an "eq?" test might even be possible in a C "switch()". (...albeit
the casts you'd have to use would be pretty ugly, since "switch()"
only supports integral types IIRC.)

+---------------
| 3.2. Efficient representation of runtime types
| As long as the "descriptor word" technique is being used, things are
| not difficult.  But as I explained earlier, for efficiency one would
| like to represent certain values as immediate values (small integers,
| perhaps booleans, nil, characters, ...).
+---------------

The SIOD-style preallocation trick works for these types, too.
And by preallocating all the values for a type (say, characters)
in a single array, one can use simple C pointer arithmetic --
*without* ugly casts -- to perform things like "char->integer"
or "integer->char".

+---------------
| As far as Scheme and mutability is concerned, one might argue that
| data structures in Scheme aren't immutable.
+---------------

Well, an implementation is *allowed* (though not required!) to make
certain data immutable. See R5RS "3.4 Storage model":

	In many systems it is desirable for constants (i.e. the values
	of literal expressions) to reside in read-only-memory. To express
	this, it is convenient to imagine that every object that denotes
	locations is associated with a flag telling whether that object
	is mutable or immutable. In such systems literal constants and
	the strings returned by symbol->string are immutable objects,
	while all objects created by the other procedures listed in this
	report are mutable. It is an error to attempt to store a new value
	into a location that is denoted by an immutable object. 

+---------------
| There is one data structure that can be (and by most Scheme compilers
| will be forced to be) immutable: the closure.
+---------------

Though note that closures contain their environments, and portions
of those captured environments *can* be mutable! The only thing that
helps one out of this thorny question is that there are no mutation
operators defined on closures themselves! [In fact, AFAICT the *only*
mutable things required by R5RS (again, "3.4 Storage model") are variables
(which aren't "objects" per se, so the notion of "mutation" itself is
a bit weird -- what is mutated is the location the variable is bound to)
and "objects such as pairs, vectors, and strings" (indeed, *only* pairs,
vectors, and strings are defined with any mutation operators).


-Rob

-----
Rob Warnock, 31-2-510		rpw3@sgi.com
Network Engineering		http://reality.sgi.com/rpw3/
Silicon Graphics, Inc.		Phone: 650-933-1673
1600 Amphitheatre Pkwy.		PP-ASEL-IA
Mountain View, CA  94043