Thomas A. Russ <tar@sevak.isi.edu> wrote:
+---------------
| rpw3@rpw3.org (Rob Warnock) writes:
| > [1] Trivia: ... [In CMUCL] fixnums take up both the #b000 and #b100
| > codepoints (for even & odd fixnums, resp.)...
|
| So does that mean EVENP and ODDP are implemented as type tests? ;)
+---------------
Hardly! From "cmucl/src/compiler/srctran.lisp":
(def-source-transform evenp (x) `(zerop (logand ,x 1)))
(def-source-transform oddp (x) `(not (zerop (logand ,x 1))))
where DEF-SOURCE-TRANSFORM is heavy internal compiler macrology... ;-} ;-}
From there on it depends on the compiler to optimize the generic
LOGAND+test if it can, which it can if you declare the arg fixnum:
> (disassemble
(compile
(defun foo (x)
(declare (fixnum x))
(if (evenp x)
123
456))))
...[chatter]...
58946B78: .ENTRY FOO(x) ; (FUNCTION (FIXNUM) (OR # #))
...
...[generic arg-counting type-checking entry code]...
...
42: MOV EAX, EBX ; No-arg-parsing entry point
44: AND EAX, 4 ; See? It knows which bit to test
47: TEST EAX, EAX
49: JEQ L1
4B: MOV EDX, 1824 ; fixnum encoding for 456
50: L0: MOV ECX, [EBP-8] ; code for doing a return
53: MOV EAX, [EBP-4]
56: ADD ECX, 2
59: MOV ESP, EBP
5B: MOV EBP, EAX
5D: JMP ECX
5F: L1: MOV EDX, 492 ; fixnum encoding for 123
64: JMP L0
...
If you replace EVENP with ODDP the code is identical except that
the values of the "MOV EDX,nnn" immediate constants get swapped.
[In terms of branch prediction, I suppose you could say the compiler
is always predicting that fixnum arguments are even.]
+---------------
| I suppose it is a clever way to get 7 type codes while still having 30
| bit integers. Better than the naive solution of using 2 type bits and
| only getting 4 codes.
+---------------
I've seen other Lisps/Schemes in which fixnums use N-1 or N-2 bits,
pointers use N-2 or N-3 bits, and immediates chew up a byte or so
of type, leaving N-8 bits or some irregular sizes for the value.
In their docs you'll see complex type-decoding patterns like this,
from SCM's "code.doc":
IMMEDIATE: B,D,E,F=data bit, C=flag code, P=pointer address bit
................................
pointer PPPPPPPPPPPPPPPPPPPPPPPPPPPPP000
inum BBBBBBBBBBBBBBBBBBBBBBBBBBBBBB10
ichr BBBBBBBBBBBBBBBBBBBBBBBB11110100
iflag CCCCCCC101110100
isym CCCCCCC001110100
IMCAR: only in car of evaluated code, cdr has cell's GC bit
ispcsym 000CCCC00CCCC100
iloc 0DDDDDDDDDDDDDDDEFFFFFFF11111100
gloc PPPPPPPPPPPPPPPPPPPPPPPPPPPPP001
CMUCL's encoding is fairly tame by comparison. From "internal-design.txt"
[out-of-date, so I've updated it to reflect a change to codepoint #b110
which let them go above 32 heap object types, since heap header words were
originally tagged as "other-immediate"s (now "other-immediate-{0,1}")]:
The following is a key of the three bit low-tagging scheme:
000 even fixnum
001 function pointer
010 other-immediate-0 (header-words, etc.)
011 list pointer
100 odd fixnum
101 structure pointer
110 other-immediate-1 (chars, symbol-value trap value, etc.)
111 other-pointer to data-blocks (other than conses, structures,
and functions)
So if the lispobj it's odd, it's a pointer; evens are fixnums and immediates.
-Rob
-----
Rob Warnock <rpw3@rpw3.org>
627 26th Avenue <URL:http://rpw3.org/>
San Mateo, CA 94403 (650)572-2607