Paul Wallich <pw@panix.com> wrote:
+---------------
| Gene wrote:
| > As a practical matter, it may be easier to bootstrap a language other
| > than CL, e.g. try ANSI C to get a compiler that will compile gcc, then
| > use this to build one of the standard C-based CL systems.
|
| How well specified is clisp's virtual machine, and how big is the set of
| primitives/subprimitives? It seems to me that for the purpose you're
| sort of thinking about, being able to implement a simple VM would
| probably suffice. (ABCL might be another useful candidate, since the
| specification of the Java virtual machines will likely be around for a
| while.)
+---------------
CMUCL's byte codes might also be a good choice. They run in the
same VM as normal compiled-to-machine-code CMUCL functions, and
byte-compiled functions can can & be called by native-compiled
functions. If I'm reading the CMUCL source correctly, there are
only ~23 bytes codes you need to implement [though some of them
have fairly hairy semantics under the covers].
To implement the byte-code VM on bare metal, I would write a
cross-assembler in CL, then write code the interpreter for the
byte-code VM in a DSL consisting of CL macros that expanded
to assembler source [well, "source" here being one s-expr per
instruction], much like the "VOPs" in the CMUCL native compiler
itself. [You can use CMUCL's own byte-code interpreter as a
model.] Add some start-up & driver code (in the same assembler),
cross-assemble, cross-link, & copy over, and you have your "VM".
Then just use the normal CMUCL compiler to byte-compile any code
you want [such as the full CMUCL compiler itself!!], and copy the
byte-compiled FASLs across to run in your VM. [Yes, this means the
VM will also have to be taught how to read CMUCL ".bytef" FASLs.]
It's interesting that this topic is coming up at this point.
For some time now, I've actually been discussing this [building
a standalone CMUCL byte-code VM, only in C, not assembler] with
some friends as a possible way to simplify the bootstrapping of
CMUCL itself, with the goal being same one as achieved by the SBCL
project -- that is, to be able to compile the system "from scratch"
with only a C compiler for the target architecture and only "some"
running CL implementation available (on the same or a connected
machine) -- but using a rather different approach (the byte-code VM)
to get to that goal.
+---------------
| On the one hand, compiling to some kind of byte code will be
| slower (execution-wise) than compiling to bare metal...
+---------------
Yes, significantly slower. The following function:
(defun spin (x)
(dotimes (i x)
(declare (fixnum i x))))
when run in CMUCL's interpreter, byte-compiled, and native-compiled
modes on an x86 architecture takes about 7400, 700, and 2 cycles per
iteration, respectively.[1] That is, for this particular trivial
micro-benchmark, byte-compiled code is about 350 times slower than
native machine code, and raw interpreted code[2] is another 10+ times
slower than that.
But, hey, what's a mere factor of 350x amongst friends during
bootstrapping?!? ;-} ;-}
-Rob
[1] Note: On x86, the native-compiled inner loop for SPIN is
these four instructions [and, yes, the "MOV ECX, EBX" could
be hoisted outside the loop!]:
L0: ADD EAX, 4
L1: MOV ECX, EBX
CMP EAX, ECX
JL L0
On Athlon CPUs [including Athlon-64 in 32-bit mode], the speed
of this four-instruction loop depends on the memory alignment
of the compiled code. I haven't done an extensive analysis, but
it appears that some alignments produce the 2 CPU-cycle/iteration
speed I report above, while some alignments yield 3 cycles/iteration.
I have seen the speed of SPIN change from one to the other (or back)
as a result of a GC.
[2] Well, even "interpreted" code in CMUCL is "IR1-converted" before
being run, so that macros are only expanded once.
-----
Rob Warnock <rpw3@rpw3.org>
627 26th Avenue <URL:http://rpw3.org/>
San Mateo, CA 94403 (650)572-2607