George Neuner <gneuner2@comcast.net> wrote:
+---------------
| I like Lisp - but let's be honest - Lisp doesn't make bit munging easy.
...
| Then too, operating systems are not the only places where system code
| exists. A hell of a lot of code is written for bare hardware in
| embedded systems and quite a lot of it falls into the bit munging
| category.
...
| [*] I distinguish "munging" from "banging". I use "munging" to mean
| interpreting the same bits differently in different contexts and
| "banging" to mean altering data at the bit level. YMMV and your
| terminology as well.
+---------------
One of the reasons I really like CMUCL is that it *does* let one get
down to the bit level if you really want to. [Most other production
CL compilers also provide for this; I just happen to know CMUCL best.]
Assume for the same of argument that one has already created the
following aliases, abbreviations, or shortcuts [as I have in my normal
toolbox]:
- A read-macro for "0" that treats "0x{number}" as "#x{number}".
- A FORMAT function "0x" for convenient hex printing.
- make-lisp-obj == kernel:make-lisp-obj
- lisp-obj == kernel:get-lisp-obj-address
- r{8,16,32} == (lambda (addr)
(system:sap-ref-{8,16,32} (system:int-sap addr) 0))
- w{8,16,32} == (lambda (addr new-value)
(setf (system:sap-ref-{8,16,32} (system:int-sap addr) 0)
new-value))
- dump32 == (lambda (addr &optional (len #x40) (print-addr addr))
"Does a hex dump from ADDR through (+ ADDR LEN -1),
labelling locations with PRINT-ADDR. (The latter is
useful when the object is mmap()'d to hardware.)"
Then you can do these sorts of things:
cmu> (gc :full t) ; So things won't move around while we play.
; [GC threshold exceeded with 10,060,488 bytes in use. Commencing GC.]
; [GC completed with 1,158,608 bytes retained and 8,901,880 bytes freed.]
; [GC will next occur when at least 13,158,608 bytes are in use.]
NIL
cmu> (deflex foo (vector 1 2 3 4))
FOO
cmu> (lisp-obj foo)
1209122855
cmu> (hex *)
0x4811c027
1209122855
cmu>
CMUCL uses 3-bit lowtags; a 7 means "other heap object" (that is,
not cons, function, or CLOS instance), which includes arrays.
Thus the array actually starts in memory at 0x4811c020, and we
*could* dump it this way:
cmu> (loop for addr from 0x4811c020 by 4 repeat 6
collect (r32 addr))
(58 16 4 8 12 16)
cmu> (format t "~{~/0x/~^ ~}~%" *)
0x0000003a 0x00000010 0x00000004 0x00000008 0x0000000c 0x00000010
NIL
cmu>
which would show us that the heap header tag for SIMPLE-VECTOR is
decimal 58 (hex 0x3a), and would suggest that in CMUCL fixnums are
30 bits (using both the 0 & 4 lowtags), and that the second word of
a SIMPLE-VECTOR is the user-visible length of the vector as a fixnum
[all of which is in fact the case]. Or we could just use D32: ;-}
cmu> (d32 foo)
0x4811c020: 0x0000003a 0x00000010 0x00000004 0x00000008
0x4811c030: 0x0000000c 0x00000010 0x48119217 0x4811c043
0x4811c040: 0x28f0000b 0x28f0000b 0x0000008c 0x28f0000b
0x4811c050: 0x4811c04b 0x48119403 0x48119243 0x4811c063
cmu>
Yes, the display overran the object. So sue me. ;-}
If I weren't being lazy I could have typed this:
cmu> (d32 24)
0x4811c020: 0x0000003a 0x00000010 0x00000004 0x00000008
0x4811c030: 0x0000000c 0x00000010
cmu>
or:
cmu> (d32 foo (* 4 (+ 2 (length foo))))
0x4811c020: 0x0000003a 0x00000010 0x00000004 0x00000008
0x4811c030: 0x0000000c 0x00000010
cmu>
or even:
cmu> (d32 foo (+ 8 (r32 (+ 4 (logandc2 (lisp-obj foo) 7)))))
0x4811c020: 0x0000003a 0x00000010 0x00000004 0x00000008
0x4811c030: 0x0000000c 0x00000010
cmu>
<ASIDE>
The 3rd arg to D32 is for when the address is mmap'd to some physical
hardware, and you want to display the physical address [to match the
bus and/or chip documentation] instead of the virtual address when dumping:
cmu> (d32 foo 24 0xcf900000)
0xcf900000: 0x0000003a 0x00000010 0x00000004 0x00000008
0xcf900010: 0x00004ab2 0x00000010
cmu>
</ASIDE>
Now suppose we want to change FOO from #(1 2 3 4) to #(1 2 #\J 4).
Yes, we could (SETF (AREF FOO 2) #\J), but what fun is that?!? ;-} ;-}
cmu> (hex (char-code #\J)) ; An ASCII "J" is 74 (0x4a), in CMUCL
0x0000004a
74
cmu> (hex (lisp-obj #\J)) ; stored shifted up 8 in immediate type 0xb2.
0x00004ab2
19122
cmu> (w32 (+ 16 (logandc2 (lisp-obj foo) 7)) 19122) ; *ZAP!!*
cmu> (d32 foo 24)
0x4811c020: 0x0000003a 0x00000010 0x00000004 0x00000008
0x4811c030: 0x00004ab2 0x00000010
cmu> foo
#(1 2 #\J 4)
cmu>
Is that enough "munging/banging" for you?!? ;-} ;-}
If not, I can dig up plenty of examples from for hardware bringup and
debugging [or you can just search for lots of previous noise by yours
truly in this group on that subject, search terms: "rpw3 hwtool opfr"].
+---------------
| I used to do embedded programming and I've seen as much
| as 50% of an application devoted to munging data.
+---------------
Indeed. My "hwtools" script [which uses my "peek-poke" library]
is almost *entirely* bit-banging stuff. A few tiny examples:
;;; Spray bits out for easy reading of hardware registers
(defun decode-bits (n)
(let (mflag)
(when (minusp n)
(setf mflag t n (lognot n)))
(loop for i downfrom (integer-length n) to 0
when mflag
collect (if (zerop n) 'all-ones 'all-ones-except)
and do (setf mflag nil)
when (logbitp i n)
collect i)))
;;; Same as above, but with textual labels.
(defun decode-named-bits (value name-vector &key show-negated)
(loop for i downfrom (1- (length name-vector)) to 0 do
(cond
((logbitp i value)
(format t " ~a" (aref name-vector i)))
(show-negated
(format t " \\~a" (aref name-vector i)))))
(force-output))
used thusly:
cmu> (decode-bits 0x48d02cf3)
(30 27 23 22 20 13 11 10 7 6 5 4 1 0)
cmu>
Useful enough, but what do they *mean*?!?
cmu> (deflex icr-bit-names '#( ; From an I2C controller chip
"Start"
"Stop"
"Nack"
"TransferByte"
"MasterAbort"
"SCL_En"
"I2C_En"
"GenCall_Dis" )) ; There were more, but that's enough for now.
ICR-BIT-NAMES
cmu> (decode-bits 0x69)
(6 5 3 0)
cmu> (decode-named-bits 0x69 icr-bit-names)
I2C_En SCL_En TransferByte Start
NIL
cmu> (decode-named-bits 0x69 icr-bit-names :show-negated t)
\GenCall_Dis I2C_En SCL_En \MasterAbort TransferByte \Nack \Stop Start
NIL
cmu>
-Rob
p.s. Most O/S kernels have similar internal routines which, in fact, the
above was modelled on. E.g., from "dmesg.boot" on my FreeBSD laptop, we
see how the "Features" register in the CPU is decoded:
CPU: AMD Athlon(tm) Processor (1836.65-MHz 686-class CPU)
Origin = "AuthenticAMD" Id = 0x6a0 Stepping = 0
Features=0x383f9ff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE>
AMD Features=0xc0480000<MP,AMIE,DSP,3DNow!>
-----
Rob Warnock <rpw3@rpw3.org>
627 26th Avenue <URL:http://rpw3.org/>
San Mateo, CA 94403 (650)572-2607