Daniel Weinreb <dlw@alum.mit.edu> wrote:
+---------------
| Maciej Katafiasz wrote:
| > That is also one of the reasons I don't jump to signals as the first
| > solution of a problem, they have almost no defined semantics. About the
| > only stable thing you can say about them is that they exist, and that you
| > can't handle KILL and STOP/CONT. Everything else, from their meaning, to
| > semantics of signalling yourself, to semantics of what happens in multi-
| > threaded scenarios, is pretty much a big cloud of unspecifiedness.
|
| Not SIGSEGV. A confusing things about Unix/Posix/Linux is that
| there are two completely different things, both modelled as
| "signals". There are asynchronous ones (like killing a process)
| and there are synchronous ones like SIGSEGV. The semantics of
| SIGSEGV is 100% completely clear.
+---------------
And indeed, a number of generational garbage collectors[1] use the
VM system itself to implement a hardware write barrier to enable the
noting/recording of the "remembered sets", that is, which objects
in older generations point to objects in newer generations, and thus
which older generation pointers must be added to the root set when
collecting newer generations. There are several variants on this
theme[2], and there are arguments about whether this or that workload
would benefit from using a software write barrier [perhaps plus
"card marking"] instead[3], but the "paging hardware write barrier"
approach is certainly reliable [where it can be used at all].
-Rob
[1] CMUCL's (on x86) is one, but there are many others.
[2] In all of them, at the completion of a GC all of the write
enable bits are turned off (or write protection bits are
turned on) on all of the pages in the GC-managed heap
[*except* for the nursery, of course!]. In some versions,
for any store into a write-protected heap page the signal
handler [usually part of the GC] will simply turn write-enable
back on for that page and return from the signal, which will
automatically retry the store, which will now quietly succeed.
At the next GC, any heap pages in generations older that the
one(s) being collection which have their write enable bits on
have thus been stored into since the last GC, and all of the
objects on all such pages are scanned to see if they need to
be added to the remembered set. Overall, this approach can be
quite cheap if there are *lots* of stores but to relatively few
older generation pages.
Another approach is, when a write-protection SIGSEGV occurs, to
precisely locate the object being stored into and record either
the whole object (if it's small) or *only* the slot being written
into (if the object's really large) in the remembered set, then
(a) temporarily turn on write-enable on the page, (b) emulate
[that is, perform] the store in the signal handler, (c) "push
forward" the interrupted PC past the store, (d) turn write-enable
back off on that page, and (e) return from the signal handler
to the new PC just past the store. Overall, this approach may
be cheaper than that previous one if there are relatively few
total stores but to widely-scattered older generation pages.
[3] This is particularly true if (a) the cost of a write-protection
trap and the associated SIGSEGV is *very* high on that platform;
(b) type-propagation can be used to suppress performing the write
barrier when non-pointer data is being stored; (c) pointer stores
are fairly infrequent, and (d) pointer stores tend to be clustered
around regions considerably smaller than a page size [in which
case that region/cluster size is probably a good choice for the
"card" size in a card-marking scheme].
-----
Rob Warnock <rpw3@rpw3.org>
627 26th Avenue <URL:http://rpw3.org/>
San Mateo, CA 94403 (650)572-2607