Madhu <enometh@meer.net> wrote:
+---------------
| Robert Maas <rem-2008jun12-001@yahoo.com> wrote:
| | (CMU Common Lisp 18b, ... Python 1.0, target Intel x86)
| | The purpose of this test is to learn (by experiment) what size
| | active memory fits within the CPU cache and what size thrashes the
| | CPU cache thereby significantly slowing performace.
| <snip>
| | So what's allocating all that memory? Is it inside RANDOM??
|
| Yes. I believe CMUCL's RANDOM conses if its integer argument is greater
| than kernel::random-fixnum-max, which is (expt 2 22).
| [I Checked this on rand-mt19937.lisp on an old version of CMU Common
| Lisp 2006-09-07 (19D), linux Python 1.1, target Intel x86]
+---------------
Well, when I tried it on CMUCL-19c, when given a literal argument
[but much more about that below!] RANDOM didn't cons at all:
cmu> arrsiz
268435456
cmu> (time (dotimes (i 100000000) (random 268435456)))
; Compiling LAMBDA NIL:
; Compiling Top-Level Form:
; Evaluation took:
; 2.93f0 seconds of real time
; 2.892589f0 seconds of user run time
; 5.6f-5 seconds of system run time
; 5,428,917,417 CPU cycles
; 0 page faults and
; 0 bytes consed. <== NO CONSING
;
NIL
cmu>
The real problems here are two:
First & foremost, Robert forgot to *compile* ONE-ACCESS!!
cmu> (compile 'one-access)
; Compiling LAMBDA NIL:
; Compiling Top-Level Form:
ONE-ACCESS
NIL
NIL
cmu>
Now his example TIME conses *much* less:
cmu> (time (dotimes (k 50000) (one-access)))
; Compiling LAMBDA NIL:
; Compiling Top-Level Form:
; Evaluation took:
; 0.12f0 seconds of real time
; 0.069097f0 seconds of user run time
; 0.038177f0 seconds of system run time
; 222,723,659 CPU cycles
; 0 page faults and
; 4,698,872 bytes consed.
;
NIL
cmu>
That's only ~94 bytes/call. [Actually, varies from 80 to 96, see below.]
But there's a more subtle problem, probably CMUCL-specific, which
is that ONE-ACCESS is calling (RANDOM ARRSIZ), which, despite being
a (near-)constant here, seems to cause consing. Compare this:
cmu> (time (dotimes (i 10000) (random 268435456)))
; Compiling LAMBDA NIL:
; Compiling Top-Level Form:
; Evaluation took:
; 0.0f0 seconds of real time
; 2.23f-4 seconds of user run time
; 3.f-6 seconds of system run time
; 408,602 CPU cycles
; 0 page faults and
; 0 bytes consed. <== NO CONSING!!
;
NIL
cmu>
versus this:
cmu> arrsiz
268435456
cmu> (time (dotimes (i 10000) (random arrsiz)))
; Compiling LAMBDA NIL:
; Compiling Top-Level Form:
; Evaluation took:
; 0.01f0 seconds of real time
; 0.011814f0 seconds of user run time
; 0.0f0 seconds of system run time
; 23,014,591 CPU cycles
; 0 page faults and
; 939,296 bytes consed.
;
NIL
cmu>
Whoa!! *That* was unexpected! [...at least, to me.]
So what's going on here?!? Hmmm...
cmu> (= arrsiz 268435456)
T
cmu> (fixnump arrsiz)
T
cmu> (time (random 268435456))
; Compiling LAMBDA NIL:
; Compiling Top-Level Form:
; Evaluation took:
; 0.0f0 seconds of real time
; 8.f-6 seconds of user run time
; 0.0f0 seconds of system run time
; 6,113 CPU cycles
; 0 page faults and
; 0 bytes consed. <== NO CONSING
;
246292663
cmu> (time (let ((x 268435456)) (random x)))
; Compiling LAMBDA NIL:
; Compiling Top-Level Form:
; Evaluation took:
; 0.0f0 seconds of real time
; 9.f-6 seconds of user run time
; 0.0f0 seconds of system run time
; 6,202 CPU cycles
; 0 page faults and
; 0 bytes consed. <== NO CONSING
;
32540891
cmu> (time (let ((x arrsiz)) (random x)))
; Compiling LAMBDA NIL:
; Compiling Top-Level Form:
; Evaluation took:
; 0.0f0 seconds of real time
; 2.1f-5 seconds of user run time
; 0.0f0 seconds of system run time
; 30,686 CPU cycles
; 0 page faults and
; 96 bytes consed. <== **WTF?!?**
;
174987352
cmu>
(*rustle*) (*rustle*)... (*grep*) (*grep*)...
Hmmm... Seems there's an internal sort of "compiler macro" built
into the CMUCL compiler -- but *not* visible to user, that is,
(COMPILER-MACRO-FUNCTION 'RANDOM) ==> NIL -- that partially inlines
calls to RANDOM if the arg meets some narrow conditions that I can't
quite parse. It's in the file "cmucl-19c/src/compiler/x86/arith.lisp",
in the (DEFINE-VOP (RANDOM-MT19937) ...) if anyone wants to look at it.
Anyway, when those conditions are met, the compiler partially inlines
the MT19937 algorithm [still calling out to RANDOM-MT19937-UPDATE as
necessary]; otherwise it just generates a normal out-of-line call
to RANDOM. Compare the output of this:
(disassemble (lambda () (random 268435456)))
with this:
(disassemble (lambda () (random arrsiz)))
And, indeed, when ONE-ACCESS is changed to manually inline
the value of ARRSIZ, the consing goes away completely!!
cmu> (defun one-access ()
(setf (aref arr (the (unsigned-byte 28) (random 268435456)))
(the (unsigned-byte 8) (random 256))))
ONE-ACCESS
cmu> (compile *)
; Compiling LAMBDA NIL:
; Compiling Top-Level Form:
ONE-ACCESS
NIL
NIL
cmu> (time (dotimes (k 50000) (one-access)))
; Compiling LAMBDA NIL:
; Compiling Top-Level Form:
; Evaluation took:
; 0.03f0 seconds of real time
; 0.028682f0 seconds of user run time
; 0.0f0 seconds of system run time
; 60,136,254 CPU cycles
; 0 page faults and
; 0 bytes consed.
;
NIL
cmu>
Anyway, the main take-away here is to remember to
*always* compile your functions before trying to analyse
speed and/or consing...
-Rob
-----
Rob Warnock <rpw3@rpw3.org>
627 26th Avenue <URL:http://rpw3.org/>
San Mateo, CA 94403 (650)572-2607