Robert Swindells <rjs@fdy2.demon.co.uk> wrote:
+---------------
| There is also the FPGA in an Opteron socket that was described in
| a thread last week.
+---------------
Ex-*SPEND*-sive!! [Pardon my deliberate misspelling.]
+---------------
| You might be able to use the HyperTransport protocol to snoop
| the caches of the real processors and use this to implement
| write barriers without the overhead of having to trap to user space.
+---------------
1. I suspect the aforementioned FPGA uses plain ol' "non-coherent HT",
which is what is used for ordinary I/O (PIOs & DMA). The cc/NUMA
cache-coherency, on the other hand, uses the AMD "coherent HT"
protocol, and the latter is still quite confidential, AFAIK.
[Any given link runs in either "non-coherent" or "coherent" mode.
HT links to I/O chips such as the Am8111, Am8131/Am8132, or the
NVidia NForce run in "non-coherent" mode; links between Opteron
CPUs run in "coherent" mode.] Not even NDA partners get *all* the
details about "coherent" mode [though they do get some]. Before you
could build an FPGA that spoke the cc/NUMA protocol, you'd have to
succeed in some *serious* negotiating with AMD for NDA access.
However, some people obviously have succeeded, e.g., consider
the Horus chip from NewIsis:
http://www.hypertransport.org/docs/tech/horus_external_white_paper_final.pdf
2. Even assuming success in #1, it's not at all clear that there is
adequate information available in another socket to implement a GC
write barrier. Write barriers have to do their work *NOW*, not at
some later time when the trail has already been thoroughly muddled.
Opertons have "write-back" caches, not "write-through", and the
dirty data can be retired in random order.
3. And while you'll see the SNOOP broadcasts from other CPUs asking
about *your* dirtying of a given cache line, you won't necessarily
see what data is actually stored by *them* (which happens later
via a potentially different path through the HT fabric).
Worse, if the store is being done into an Opteron's local SDRAM,
and the local memory controller knows from recent history that
no-one else has the data cached [e.g., right after a SNOOP broadcast
gets its responses back], then *no-one* else except that local
Operton will ever *see* what data was stored -- which makes an
external write barrier impossible.
-Rob
-----
Rob Warnock <rpw3@rpw3.org>
627 26th Avenue <URL:http://rpw3.org/>
San Mateo, CA 94403 (650)572-2607