Re: are relational databases not needed when you use lisp? paul graham did not

Daniel Weinreb  <dlw@alum.mit.edu> wrote:
+---------------
| Maciej Katafiasz wrote:
| > That is also one of the reasons I don't jump to signals as the first 
| > solution of a problem, they have almost no defined semantics. About the 
| > only stable thing you can say about them is that they exist, and that you 
| > can't handle KILL and STOP/CONT. Everything else, from their meaning, to 
| > semantics of signalling yourself, to semantics of what happens in multi-
| > threaded scenarios, is pretty much a big cloud of unspecifiedness.
| 
| Not SIGSEGV.  A confusing things about Unix/Posix/Linux is that
| there are two completely different things, both modelled as
| "signals".  There are asynchronous ones (like killing a process)
| and there are synchronous ones like SIGSEGV.  The semantics of
| SIGSEGV is 100% completely clear.
+---------------

And indeed, a number of generational garbage collectors[1] use the
VM system itself to implement a hardware write barrier to enable the
noting/recording of the "remembered sets", that is, which objects
in older generations point to objects in newer generations, and thus
which older generation pointers must be added to the root set when
collecting newer generations. There are several variants on this
theme[2], and there are arguments about whether this or that workload
would benefit from using a software write barrier [perhaps plus
"card marking"] instead[3], but the "paging hardware write barrier"
approach is certainly reliable [where it can be used at all].


-Rob

[1] CMUCL's (on x86) is one, but there are many others.

[2] In all of them, at the completion of a GC all of the write
    enable bits are turned off (or write protection bits are
    turned on) on all of the pages in the GC-managed heap
    [*except* for the nursery, of course!]. In some versions,
    for any store into a write-protected heap page the signal
    handler [usually part of the GC] will simply turn write-enable
    back on for that page and return from the signal, which will
    automatically retry the store, which will now quietly succeed.
    At the next GC, any heap pages in generations older that the
    one(s) being collection which have their write enable bits on
    have thus been stored into since the last GC, and all of the
    objects on all such pages are scanned to see if they need to
    be added to the remembered set. Overall, this approach can be
    quite cheap if there are *lots* of stores but to relatively few
    older generation pages.

    Another approach is, when a write-protection SIGSEGV occurs, to
    precisely locate the object being stored into and record either
    the whole object (if it's small) or *only* the slot being written 
    into (if the object's really large) in the remembered set, then
    (a) temporarily turn on write-enable on the page, (b) emulate
    [that is, perform] the store in the signal handler, (c) "push
    forward" the interrupted PC past the store, (d) turn write-enable
    back off on that page, and (e) return from the signal handler
    to the new PC just past the store. Overall, this approach may
    be cheaper than that previous one if there are relatively few
    total stores but to widely-scattered older generation pages.

[3] This is particularly true if (a) the cost of a write-protection
    trap and the associated SIGSEGV is *very* high on that platform;
    (b) type-propagation can be used to suppress performing the write
    barrier when non-pointer data is being stored; (c) pointer stores
    are fairly infrequent, and (d) pointer stores tend to be clustered
    around regions considerably smaller than a page size [in which
    case that region/cluster size is probably a good choice for the
    "card" size in a card-marking scheme].

-----
Rob Warnock			<rpw3@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607