Subject: Re: What are the domains that lisp doesn't fit int?
From: rpw3@rpw3.org (Rob Warnock)
Date: Thu, 03 May 2007 23:50:03 -0500
Newsgroups: comp.lang.lisp
Message-ID: <BOWdnUoNHermJqfbnZ2dnUVZ_r6vnZ2d@speakeasy.net>
Joe Marshall  <eval.apply@gmail.com> wrote:
+---------------
| Kent M Pitman <pit...@nhplace.com> wrote:
| > Consequently, customizing a dialect of Lisp to a particular concrete
| > machine, while it's not done for you, is not impossible to do; it's
| > probably easier than customizing C to a particular new concrete machine.
| 
| The fallacy is `Proof by Lack of Imagination'.  Just because one
| doesn't know how to write `close to the metal' code in Lisp doesn't
| mean that it is impossible or even impractical.
+---------------

(*sigh*) I find it odd that so far none of the senior Lispers in
this thread have pointed to *THEIR OWN LISPS' COMPILERS* as being
precisely such an example of "customizing a dialect of Lisp to a
particular concrete machine"!!! Come on, guys, Lisp has been doing
this since *DAY ONE*!!!  To wit, here are a couple of small snippets
from "cmucl/src/compilercmucl-19c/src/compiler/x86". The INST forms
emit bare x86 instructions, while other forms in :GENERATOR sections
are other VOPs [virtual operations] that emit sequences of INSTs --
effectively, VOPs provide a "smart" assembler with macros. I think
you'll find this is about as "close to the metal" as you're going to
get in *any* language other than raw binary!!

    ;;;; Type frobbing VOPs

    ...

    (define-vop (get-type)
      (:translate get-type)
      (:policy :fast-safe)
      (:args (object :scs (descriptor-reg)))
      (:temporary (:sc unsigned-reg :offset eax-offset :to (:result 0)) eax)
      (:results (result :scs (unsigned-reg)))
      (:result-types positive-fixnum)
      (:generator 6
	(inst mov eax object)
	(inst and al-tn lowtag-mask)
	(inst cmp al-tn other-pointer-type)
	(inst jmp :e other-ptr)
	(inst cmp al-tn function-pointer-type)
	(inst jmp :e function-ptr)

	;; pick off structures and list pointers
	(inst test al-tn 1)
	(inst jmp :ne done)

	;; pick off fixnums
	(inst and al-tn 3)
	(inst jmp :e done)

	;; must be an other immediate
	(inst mov eax object)
	(inst jmp done)
	
	FUNCTION-PTR
	(load-type al-tn object (- vm:function-pointer-type))
	(inst jmp done)
	
	OTHER-PTR
	(load-type al-tn object (- vm:other-pointer-type))
	
	DONE
	(inst movzx result al-tn)))

    ...

    (define-vop (make-fixnum)
      (:args (ptr :scs (any-reg descriptor-reg) :target res))
      (:results (res :scs (any-reg descriptor-reg)))
      (:generator 1
	;;
	;; Some code (the hash table code) depends on this returning a
	;; positive number so make sure it does.
	(move res ptr)
	(inst shl res 3)
	(inst shr res 1)))

    ...

    ;;;; Other random VOPs.

    ...

    (defknown lisp::%scrub-control-stack () (values))

    ;;; Scrub the control stack.
    ;;;
    ;;; On the x86 port the stack grows downwards, and to support grow on
    ;;; demand stacks the stack must be decreased as it is scrubbed.
    ;;;
    (define-vop (%scrub-control-stack)
      (:policy :fast-safe)
      (:translate lisp::%scrub-control-stack)
      (:args)
      (:results)
      (:temporary (:sc unsigned-reg) count)
      (:temporary (:sc any-reg) stack-save zero)
      (:generator 25
	(inst mov stack-save esp-tn)
	(inst mov zero 0)
	(inst push zero)
	;; Scrub the stack.
	SCRUB
	(inst add esp-tn 4)
	(inst mov count 2048)
	SCRUB-LOOP
	(inst dec count)
	(inst push zero)
	(inst jmp :nz SCRUB-LOOP)
	;; Look for a clear stack unit.
	(inst mov count 2048)
	LOOK-LOOP
	(inst sub esp-tn 4)
	(inst cmp (make-ea :dword :base esp-tn) zero)
	(inst jmp :ne SCRUB)
	(inst dec count)
	(inst jmp :nz LOOK-LOOP)
	;; Done, restore the stack pointer.
	(inst mov esp-tn stack-save)))

    ...

    ;; the RDTSC instruction (present on Pentium processors and
    ;; successors) allows you to access the time-stamp counter, a 64-bit
    ;; model-specific register that counts executed cycles. The
    ;; instruction returns the low cycle count in EAX and high cycle count
    ;; in EDX.
    ;;
    ;; In order to obtain more significant results on out-of-order
    ;; processors (such as the Pentium II and later), we issue a
    ;; serializing CPUID instruction before reading the cycle counter.
    ;; This instruction is used for its side effect of emptying the
    ;; processor pipeline, to ensure that the RDTSC instruction is
    ;; executed once all pending instructions have been completed.
    ;;
    ;; Note that cache effects mean that the cycle count can vary for
    ;; different executions of the same code (it counts cycles, not
    ;; retired instructions). Furthermore, the results are per-processor
    ;; and not per-process, so are unreliable on multiprocessor machines
    ;; where processes can migrate between processors.
    ;;
    ;; This method of obtaining a cycle count has the advantage of being
    ;; very fast (around 20 cycles), and of not requiring a system call.
    ;; However, you need to know your processor's clock speed to translate
    ;; this into real execution time.

    (defknown read-cycle-counter
      ()
      (values (unsigned-byte 32) (unsigned-byte 32))
      ())

    (define-vop (read-cycle-counter)
      (:translate read-cycle-counter)
      (:guard (backend-featurep :pentium))
      (:args )
      (:policy :fast-safe)
      (:results (lo :scs (unsigned-reg))
		(hi :scs (unsigned-reg)))
      (:result-types unsigned-num unsigned-num)
      (:temporary (:sc unsigned-reg :offset eax-offset :target lo) eax)
      (:temporary (:sc unsigned-reg :offset edx-offset :target hi) edx)
      (:generator 1
	 (inst cpuid)
	 (inst rdtsc)
	 (move hi edx)
	 (move lo eax)))

    #+pentium
    (defun read-cycle-counter ()
      (read-cycle-counter))

And it's not just CMUCL, of course. *Any* other mature Lisp compiler
will provide countless more examples of such "customizing ... Lisp to
a particular concrete machine".


-Rob

-----
Rob Warnock			<rpw3@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607