Subject: Re: Interpreters
From: rpw3@rpw3.org (Rob Warnock)
Date: Sun, 15 Aug 2004 07:05:21 -0500
Newsgroups: comp.lang.lisp
Message-ID: <bpCdndK-YagczILcRVn-gg@speakeasy.net>
Gisle S�lensminde  <gisle@kaktus.ii.uib.no> wrote:
+---------------
| Alexander Burger wrote:
| > Rob Warnock <rpw3@rpw3.org> wrote:
| >> Alexander Burger  <abu@software-lab.de> wrote:
| >> +---------------
| >> | They keep on talking about the efficiency of lexical binding, which is
| >> | true for compiled code, but which is not given for an interpreter.
| >> +---------------
| > 
| >> You're wrong. See Christian Queinnec's "Lisp in Small Pieces",
| >> especially the chapter on "Fast Interpretation". This is all
| > 
| > I don't have the book. In the Web I find
| >   * Chapter 6: Fast interpretation Precompile expressions to speed up
| 
| There is no conflict in having a compilation step in an interpreter.
+---------------

Agreed, though I would say it slightly differently: There is no conflict
in having a *preprocessing* step in an interpreter, which is precisely
what Chapter 6 of L.i.S.P. is all about. It is *NOT* about "compiling";
for that see Chapter 7 "ByteCode compilation" or Chapter 10 "Compilation
towards C".

The *preprocessing* steps mentioned in Chapter 6 still leave you with
S-expressions to be interpreted as usual; it's just that some of the
S-expressions have been re-arranged a little bit. For example, in a
system in which lexical environments are represented by vectors of
vectors of values [Queinnec offers other alternative implementations,
but vectors-of-vectors is one of the classic ones], during the
preprocessing step lexical variables are allocated offsets in the
current vector of values, and symbols which are in the scope of some
lexical variable of the same name are transformed into compact LEXVAR
objects which contain the "level" (the offset of in the vector-of-vectors
pointing to the value vector) and "offset" (index in the desired vector).
Thus at run time, when the interpreter sees a LEXVAR, it does *NOT* have
to look up anything in a symbol table -- it just double-indexes through
the vector-of-vectors.

The same can be done to global variables as well, in a manner that
can easily be extended to threading. That is, global variables can be
allocated an index in the "global variable value table" when first seen,
and during the preprocessing step symbols which are in the scope of
some global variable of the same name are transformed into compact
GLOVAR objects which contain the index of that global variable's value
in the global value table. Thus at run time, when the interpreter sees
a GLOVAR, it does *NOT* have to look up anything in a symbol table --
it just indexes into the global value table.

[The extension of GLOVAR handling to threads is left as an exercise for
the reader (hint from David Wheeler: "Any problem in computer science
can be solved with another level of indirection"), but suffice it to
say that both Corman Lisp and Franz's Allegro do something similar for
their global/special variables.]

Transformation of symbols into LEXVARs & GLOVARs can be done in a single
pass over the source S-expression, along with other transformations such
as macro expansion. While not providing as much performance improvement
as (say) byte-compiling, such preprocessing is *itself* fast, and the
resulting code can be interpreted much faster than un-preprocessed
S-expressions.

To hammer the point in: The result of such preprocessing is *still*
just an S-expression, *not* "compiled" code. The run-time process is
still "interpretation" of the internal S-expression representation.

+---------------
| Many systems like perl, python, the CLISP common lisp interpreter etc,
| as well as non-jit java interpreters. In that case the code is first
| compiled into bytecode, and this bytecode is interpreted.
+---------------

As noted above, there's even value to a preprocessing step which is
*less* than byte-code compilation.

+---------------
| It then seems like there is three commin way of executing a program
+---------------

Four ways. (See my added #1.5, below.)

+---------------
| 1)  source -> parsing into datastructure -> interpretation by traversing
| datastructure
+---------------

1.5) source -> parsing into data structure -> some preprocessing of the
     data structure -> interpretation by traversing the data structure.

+---------------
| 2)  source -> compilation to byte code -> interpretation by simulating a
| virtual machine.
| 
| 3)  source -> parsing -> machine code -> execution of native code
| 
| 1) is clearly an interpreter, while 3) is clearly a compiler. For me it
| seems like you and some of the other posters here disagree on whether
| 2) is an interpreter.
+---------------

Regardless of one's opinion of #2 [I personally consider it "compiling"],
clearly both #1 and #1.5 are "interpretation".


-Rob

p.s. And to be complete, we should mention #3.3 optimizing compilers,
#3.7 interprocedural optimizing compilers, #3.9 interprocedural optimizing
compilers tuned with feedback from actual trace statistics, etc., etc.

And JIT is something that starts with #1.5 or #2 and jumps to #3.x
"on demand"...

-----
Rob Warnock			<rpw3@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607