Subject: Re: PLOT: A non-parenthesized, infix Lisp!
From: rpw3@rpw3.org (Rob Warnock)
Date: Tue, 21 Apr 2009 21:21:39 -0500
Newsgroups: comp.lang.lisp
Message-ID: <kbGdnelfsqau4nPUnZ2dnUVZ_gCdnZ2d@speakeasy.net>
George Neuner  <gneuner2@comcast.net> wrote:
+---------------
| I like Lisp - but let's be honest - Lisp doesn't make bit munging easy.
...
| Then too, operating systems are not the only places where system code
| exists.  A hell of a lot of code is written for bare hardware in
| embedded systems and quite a lot of it falls into the bit munging
| category.
...
| [*] I distinguish "munging" from "banging".  I use "munging" to mean
| interpreting the same bits differently in different contexts and
| "banging" to mean altering data at the bit level.  YMMV and your
| terminology as well.
+---------------

One of the reasons I really like CMUCL is that it *does* let one get
down to the bit level if you really want to. [Most other production
CL compilers also provide for this; I just happen to know CMUCL best.]
Assume for the same of argument that one has already created the
following aliases, abbreviations, or shortcuts [as I have in my normal
toolbox]:

  - A read-macro for "0" that treats "0x{number}" as "#x{number}".
  - A FORMAT function "0x" for convenient hex printing.
  - make-lisp-obj == kernel:make-lisp-obj
  - lisp-obj      == kernel:get-lisp-obj-address
  - r{8,16,32}    == (lambda (addr)
		       (system:sap-ref-{8,16,32} (system:int-sap addr) 0))
  - w{8,16,32}    == (lambda (addr new-value)
		       (setf (system:sap-ref-{8,16,32} (system:int-sap addr) 0)
			     new-value))
  - dump32        == (lambda (addr &optional (len #x40) (print-addr addr))
		       "Does a hex dump from ADDR through (+ ADDR LEN -1),
			labelling locations with PRINT-ADDR. (The latter is
			useful when the object is mmap()'d to hardware.)" 

Then you can do these sorts of things:

    cmu> (gc :full t)   ; So things won't move around while we play.
    ; [GC threshold exceeded with 10,060,488 bytes in use.  Commencing GC.]
    ; [GC completed with 1,158,608 bytes retained and 8,901,880 bytes freed.]
    ; [GC will next occur when at least 13,158,608 bytes are in use.]

    NIL

    cmu> (deflex foo (vector 1 2 3 4))

    FOO
    cmu> (lisp-obj foo)

    1209122855
    cmu> (hex *)
    0x4811c027
    1209122855
    cmu> 

CMUCL uses 3-bit lowtags; a 7 means "other heap object" (that is,
not cons, function, or CLOS instance), which includes arrays.
Thus the array actually starts in memory at 0x4811c020, and we
*could* dump it this way:

    cmu> (loop for addr from 0x4811c020 by 4 repeat 6
	   collect (r32 addr))

    (58 16 4 8 12 16)
    cmu> (format t "~{~/0x/~^ ~}~%" *)
    0x0000003a 0x00000010 0x00000004 0x00000008 0x0000000c 0x00000010
    NIL
    cmu> 

which would show us that the heap header tag for SIMPLE-VECTOR is
decimal 58 (hex 0x3a), and would suggest that in CMUCL fixnums are
30 bits (using both the 0 & 4 lowtags), and that the second word of
a SIMPLE-VECTOR is the user-visible length of the vector as a fixnum
[all of which is in fact the case].  Or we could just use D32:  ;-}

    cmu> (d32 foo)
    0x4811c020: 0x0000003a 0x00000010 0x00000004 0x00000008
    0x4811c030: 0x0000000c 0x00000010 0x48119217 0x4811c043
    0x4811c040: 0x28f0000b 0x28f0000b 0x0000008c 0x28f0000b
    0x4811c050: 0x4811c04b 0x48119403 0x48119243 0x4811c063
    cmu> 

Yes, the display overran the object. So sue me. ;-}
If I weren't being lazy I could have typed this:

    cmu> (d32 24)
    0x4811c020: 0x0000003a 0x00000010 0x00000004 0x00000008
    0x4811c030: 0x0000000c 0x00000010
    cmu> 

or:

    cmu> (d32 foo (* 4 (+ 2 (length foo))))
    0x4811c020: 0x0000003a 0x00000010 0x00000004 0x00000008
    0x4811c030: 0x0000000c 0x00000010
    cmu> 

or even:

    cmu> (d32 foo (+ 8 (r32 (+ 4 (logandc2 (lisp-obj foo) 7)))))
    0x4811c020: 0x0000003a 0x00000010 0x00000004 0x00000008
    0x4811c030: 0x0000000c 0x00000010
    cmu> 

<ASIDE>
  The 3rd arg to D32 is for when the address is mmap'd to some physical
  hardware, and you want to display the physical address [to match the
  bus and/or chip documentation] instead of the virtual address when dumping:

    cmu> (d32 foo 24 0xcf900000)
    0xcf900000: 0x0000003a 0x00000010 0x00000004 0x00000008
    0xcf900010: 0x00004ab2 0x00000010
    cmu> 
</ASIDE>

Now suppose we want to change FOO from #(1 2 3 4) to #(1 2 #\J 4).
Yes, we could (SETF (AREF FOO 2) #\J), but what fun is that?!?  ;-}  ;-}

    cmu> (hex (char-code #\J)) ; An ASCII "J" is 74 (0x4a), in CMUCL 
    0x0000004a
    74
    cmu> (hex (lisp-obj #\J))  ; stored shifted up 8 in immediate type 0xb2.
    0x00004ab2
    19122
    cmu> (w32 (+ 16 (logandc2 (lisp-obj foo) 7)) 19122)  ; *ZAP!!*

    cmu> (d32 foo 24)
    0x4811c020: 0x0000003a 0x00000010 0x00000004 0x00000008
    0x4811c030: 0x00004ab2 0x00000010
    cmu> foo

    #(1 2 #\J 4)
    cmu> 

Is that enough "munging/banging" for you?!?   ;-}  ;-}

If not, I can dig up plenty of examples from for hardware bringup and
debugging [or you can just search for lots of previous noise by yours
truly in this group on that subject, search terms: "rpw3 hwtool opfr"].

+---------------
| I used to do embedded programming and I've seen as much
| as 50% of an application devoted to munging data.
+---------------

Indeed. My "hwtools" script [which uses my "peek-poke" library]
is almost *entirely* bit-banging stuff. A few tiny examples:

    ;;; Spray bits out for easy reading of hardware registers
    (defun decode-bits (n)
      (let (mflag)
	(when (minusp n)
	  (setf mflag t n (lognot n)))
	(loop for i downfrom (integer-length n) to 0
	  when mflag
	    collect (if (zerop n) 'all-ones 'all-ones-except)
	    and do (setf mflag nil)
	  when (logbitp i n)
	    collect i)))

    ;;; Same as above, but with textual labels.
    (defun decode-named-bits (value name-vector &key show-negated)
      (loop for i downfrom (1- (length name-vector)) to 0 do
	(cond
	  ((logbitp i value)
	   (format t " ~a" (aref name-vector i)))
	  (show-negated
	   (format t " \\~a" (aref name-vector i)))))
      (force-output))

used thusly:

    cmu> (decode-bits 0x48d02cf3)
    (30 27 23 22 20 13 11 10 7 6 5 4 1 0)
    cmu> 

Useful enough, but what do they *mean*?!?

    cmu> (deflex icr-bit-names '#(	; From an I2C controller chip
	   "Start"
	   "Stop"
	   "Nack"
	   "TransferByte"
	   "MasterAbort"
	   "SCL_En"
	   "I2C_En"
	   "GenCall_Dis" )) ; There were more, but that's enough for now.

    ICR-BIT-NAMES
    cmu> (decode-bits 0x69)

    (6 5 3 0)
    cmu> (decode-named-bits 0x69 icr-bit-names)
     I2C_En SCL_En TransferByte Start
    NIL
    cmu> (decode-named-bits 0x69 icr-bit-names :show-negated t)
     \GenCall_Dis I2C_En SCL_En \MasterAbort TransferByte \Nack \Stop Start
    NIL
    cmu> 


-Rob

p.s. Most O/S kernels have similar internal routines which, in fact, the
above was modelled on. E.g., from "dmesg.boot" on my FreeBSD laptop, we
see how the "Features" register in the CPU is decoded:

    CPU: AMD Athlon(tm) Processor (1836.65-MHz 686-class CPU)
      Origin = "AuthenticAMD"  Id = 0x6a0  Stepping = 0
      Features=0x383f9ff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE>
      AMD Features=0xc0480000<MP,AMIE,DSP,3DNow!>

-----
Rob Warnock			<rpw3@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607