Subject: Re: Minimal keywords needed for constructing a full CL system
From: rpw3@rpw3.org (Rob Warnock)
Date: Sat, 08 Jul 2006 22:32:53 -0500
Newsgroups: comp.lang.lisp
Message-ID: <6uSdnfQsKPB45S3ZnZ2dnUVZ_qidnZ2d@speakeasy.net>
Rob Thorpe <robert.thorpe@antenova.com> wrote:
+---------------
| Don Geddis wrote:
| > But what he thinks he's looking for isn't as interesting as he
| > believes it is.
| 
| I think it's of some interest.  I mentioned some reasons it might be
| interesting previously. I thought of another one since.... rather than
| dumping an image you compile an executable, in that executable you need:
| * All the user code put in
| * All the code the user code calls
| * All the code that code calls.
+---------------

There is already considerable literature on this; it's usually
called "tree shaking" -- as in, grab the root(s) of the specific
application's heap and shake it so that all of the branches
and leaves that are *not* called/referenced by that application
"fall out", leaving you with an image which is "minimal" FOR THAT
SPECIFIC APPLICATION. Some implementations provide a tree shaker
as a built-in; most don't.

+---------------
| Solving this problem -or some nearby problem- could be much
| simpler if you knew one of the possible small sets.
+---------------

I want to go back to the problem as originally stated, because I
think the discussion has seriously missed viewing the problem from
one very significant angle: In the case of Common Lisp at least, it
seems to me what's really important is not so much the minimal set
of special forms & functions per se [which, as I & others have noted,
isn't well-determined or unique in any event], but instead, the
question should be: What is the minimal set of *semantics* that a
"Common Lisp" run-time environment must provide before most users
would agree that the implementation was a useful subset [in the
sense of CLHS 1.7 "Language Subsets"].

In a previous comment in this thread, I mentioned that I've been
tinkering with my own "minimal CL subset", and that even though I
had a bunch of stuff working -- (NIL T QUOTE LAMBDA PROGN IF SETQ
DEFUN LET LET* PRINT PRINC TERPRI VECTOR FUNCALL APPLY EVAL CAR CDR
CONS LIST LENGTH ASSOC) plus fixnum arithmetic -- I've realized
that the implementation is a dead end, because it lacks one of
the *defining* characteristics of a "Common Lisp": exceptions!!
Or to be precise, the CL Condition System [CLHS 9.0].

[It also currently lacks a GC, but given today's memory sizes,
that's less important for an implementation that will mainly be 
used for short, throwaway "scripts". Though adding GC later will 
certainly require recoding almost all the "primitives"....]
 
Also, looking at what I use (say) CMUCL for on a daily basis, 
fixnum-only arithmetic is, well, simply lame:  ;-}  ;-}

    qdl> (defun fact (n) (if (< n 2) 1 (* n (fact (1- n)))))
    FACT
    qdl> (fact 12)
    479001600
    qdl> (fact 13)

    *: ERROR: Result will not fit in a fixnum: 6227020800 [0x5cca33000]
    qdl> 

To be a "CL", you need a full numeric stack [even if non-fixnum
arithmetic is "slow", i.e., naively-coded].
 
So I'd like to suggest that a starting set of semantics for a 
"minimal CL" include at least the following:

- At least one of the set of non-local control transfer primitives
  sufficient to implement the rest, i.e. GO, RETURN-FROM, THROW, and
  their establishing forms TAGBODY, BLOCK, & CATCH. [I currently think
  that TAGBODY/GO is "the hardest" for a C-based implementation[1],
  so it feels "more primitive" to me, but the previously-mentioned
  Baker paper suggests they're all equal.]

- The CL exception system, minimally [Kent might suggest a better set]
  MAKE-CONDITION, SIGNAL, HANDLER-BIND, RESTART-BIND, FIND-RESTART, and
  INVOKE-RESTART. [I think ERROR, xxx-CASE, etc., can be defined in terms
  of these plus the non-local transfers.]

- A debugger, or at the very least *DEBUGGER-HOOK* and the ability
  for it to call user code.

- The full numeric tower: Reals [float, rational, integer (fixnum
  and bignum)] and Complex [pairs of those] and the usual related
  functions. [Personally, for the stuff I use CL for, I'd want the
  LOGxxx and BYTE functions as well, but that's just me.]

- Lists (of course!), symbols/packages, characters, vectors (including
  strings), structures, and enough hooks to add fully-general arrays
  later. [I *think* that's all you absolutely need in terms of datatypes
  to later add CLOS, say, using PCL.]

- CLHS 3.2.2.2 "Minimal Compilation", so you can at least have macros
  that only expand once. [If you're going to build up everything from
  some minimal core, you need this to avoid horrible inefficiencies.]

- CLHS 3.3.1 "Minimal Declaration Processing Requirements", which
  [to me] also implies "doing the right thing" with dynamic and
  lexical scoping, including special variables in LAMBDA lists.
  [Some of this can be "faked" with lexical-only LAMBDA lists with
  some parameter name re-writing and wrapping the LAMBDA body in an
  UNWIND-PROTECT and some added SETQs.]

- Places & DEFINE-SETF-EXPANDER & a basic set of SETF/INCF/etc.

- The rest of the expected "minimal" set of special forms/macros
  [what the thread was all about up until now].  ;-}  ;-}

Anything less is just "a" Lisp...

I await serious comments/corrections with great interest.


-Rob

[1] Consider this tiny example:

       > (defun foo (x) (funcall x))

       FOO
       > (tagbody
	   (foo (lambda () (go works))) 
	   (print 'failed)
	   (go out)
	 works
	   (print 'works) 
	 out)

       WORKS 
       NIL
       > 

-----
Rob Warnock			<rpw3@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607