Subject: Re: data hygiene [Re: Why is Scheme not a Lisp?]
From: Erik Naggum <erik@naggum.net>
Date: Sun, 24 Mar 2002 02:13:03 GMT
Newsgroups: comp.lang.lisp
Message-ID: <3225924796503700@naggum.net>

* Kimmo T Takkunen
| When reading this it occurred to me that there may be people out there
| who actually need maximum speed on assoc-lists.

  Well, that is one possible way to see it, but I see it more as an
  exercise in showing that choosing alists or plists because one has better
  theoretical performance than the other is all bunk, because in practice
  it does not appear to matter.  Not one person out there has found test
  results that clearly contradict my findings, which tells me that (1)
  modern processors do not appreciate software that does the same kind of
  optimization they try to do, (2) to make the access times matter, you
  need to ensure that you fight the cache and have long memory latency, and
  (3) even when you do, memory organization matters more than the
  algorithm.  In other words, mine is an elaborate argument to ask people
  not to bother optimizing these things because (1) if the hardware can do
  it, just let it do it, and (2) before any of this matters significantly,
  you would profit from switching to something other than alists and plist.

  I have some problems figuring out why some people think that just because
  you set up a fairly elaborate counter-argument, you actually argue in
  favor of the _opposite_ of what you demonstrate to be false.  E.g., when
  I found that alists and plists were just as fast in normal usage and I
  saw equal execution times, my attempt at an explanation was that the
  number of memory references were the same, and that memory latency was
  not at issue, some think I failed to get the other point that provably
  had a _negative_ impact on performance.  I find this so amazingly useless
  that I wonder how people think.  So, in summary, no, I do not argue for
  maximum speed of alists (or plists), I reject the notion that the maximum
  path length argument matters one hoot to actual performance of actual
  code running on actual machines, because it measurably does not.  And if
  you cannot imagine the glee that some people would take in "proving me
  wrong", I can, so when nobody steps forward to do it, we can be very
  certain that the measurements I have made are generally applicable, so
  the argument remains quite simple: do not use "performance" as an excuse
  to choose alists over plists when other concerns should weigh more.

| I have made two little functions that may or may not help. They do finds
| on lists using move-to-front and transpose heuristics. Both of them
| modify element ordering in the list they are working with.

  Thanks, this is useful, but I think I would prefer one that kept track of
  the number of times a key has been visited and maintained a list sorted
  on that number.  This could be done in a training session or at runtime.
  For instance, many astonishingly stupid C-based programmers have built
  "command tables" with strings as keys and perform a strncasecmp call for
  every string in the list until they have a match, which is dumb enough in
  itself, but then also fail to sort the "command table" in the order of
  the most likely high use.  These are the same people who would no doubt
  call Lisp slow if they did the exact same thing in Lisp, but still think
  that C is fast.

///
-- 
  In a fight against something, the fight has value, victory has none.
  In a fight for something, the fight is a loss, victory merely relief.