Thomas Weigert,
>> I have an application which does generate significant amounts of garbage,
>> and there is no easy way around it. I also make extensive use of eq-hash
>> tables. I am wondering whether these are actually a disadvantage in the
>> presence of the gc. I am assuming that eq-hash tables hash on the address
>> of the object, and after every gc we would have to rehash the tables. Is
>> this understanding correct, and if so, are there any rules of thumb for
>> deciding which type of table to use, given the suspected interaction with
>> the gc. Most of my tables need to be quick in access, and are not often
>> modified.
>>
>> Any hints are appreciated,
It is true that eq and eql hash tables require certain extra consideration,
given that objects can move in the lisp heap. We've mostly mitigated
this problem by providing the following algorithm for bot eq and eql hash
tables:
We give eq/eql hash tables three states regarding their requiring a rehash
due to object movement:
1. Clean: the hash table does not need to be rehashed at all; any accesses
or setfs to this hash table will not require a rehash.
2. Dirty after scavenge: Some objects in this hash table may have changed
their hash codes after a scavenge, so the first access/set after the
first scavenge will require a rehash prior to the operation.
3. Dirty after global-gc: This state occurs if there are no objects which
might change hash codes after a scavenge, but which might change hash
codes after a global-gc.
We also give objects themselves one of two attributes:
1. Specially-sxhashable: objects which have hash codes embedded in them
or which can be very quickly hashed without regard to the address of
the object are said to be specially-sxhashable. This includes all
numbers, all characters, all symbols (including nil), all
function-objects, and all standard-instance objects.
2. Non-specially-sxhashable objects: Those objects that do not fall into
the above category must be hashed by address.
Given the above two dimensions, we can describe the hashing algorithm for
eq and eql hash-tables:
Suppose a hash table is empty. It is thus clean. If you add a key object
to the hash table by saying (setf (gethash key ht) value), then the hash-
table may become dirty based on the key:
a. If the key is specially-sxhashable, the hash-table is not made dirty
after the setf. (However, if the hash table was dirty at all before
the setf, it does not make it clean).
b. If the key is not specially-sxhashable, but has been tenured (i.e. the
object is in old-space) the hash table is made dirty-after-global-gc
(however, if the hash-table had been dirty-after-scavenge, it is not
changed).
c. If the key is not specially-sxhashable, and is new (i.e. in new-space)
then the hash-table is made dirty-after-scavenge.
d. If a rehash is required, then the hash-table is started out as
clean, and every key item is reconsidered at the time of the rehash
as if it were going in for the first time. So if a key had been
new and thus caused a dirty-after-scavenge, but became tenured some
time before the rehash, then at the rehash time that key item will
only cause a dirty-after-global-gc (which is less drastic).
Thus, whether or not an eq/eql hash table must rehash is based on the
keys that are used. My best advice on maximizing efficiency in this
system is:
1. If possible, use only specially-sxhashable keys. For the most part,
this corresonds with the most common usages of keys, such as symbols,
numbers, and CLOS instances, but unfortunately it does not include
conses/lists, which are also common, but unfortunately not specially-
sxhashable.
2. If #1 is not possible, try to use keys that have not just been created.
If you do a (gc :tenure) after creating many keys, then they can each be
added to eq/eql hash-tables without dirtying them.
3. Try to avoid global-gcs, which force the next access on many eq/eql
hash-tables to require rehashes. Of course, if #1 is successful, then
global-gc is not a problem.
4. If #1, #2, and #3 are not possible, then perhaps one huge rehash can
be done by doing a (gc :tenure) after the table is filled up. Thereafter,
one rehash will occur, and only global-gcs will cause further rehashing.
I know that this is a complex answer, but we tried to hug the contours
of whatever was possible, in order to get the most out of eq/eql hash-table
performance.
If you have any further questions, please let me know at <franz.com, at bugs>
using the same subject line (with the spr number included) and I can
answer them.
Duane Rettig Franz Inc. http://www.franz.com/ (www)
1995 University Av Ste 275 Berkeley, CA 94704 uunet!franz!duane (uucp)
Phone: (510) 548-3600; FAX: (510) 548-8253 <Franz.COM at duane> (internet)