Subject: Re: Politeness and language growth
From: Erik Naggum <erik@naggum.no>
Date: 1999/01/09
Newsgroups: comp.lang.lisp
Message-ID: <3124911593956744@naggum.no>

* Andi Kleen <ak-uu@muc.de>
| Sorry, I meant RST instead of ICMP of course.

  OK.

| There are possible scenarios, e.g. when the final ACK is delayed, the
| SYN-ACK is retransmitted, RST is send for the SYN-ACK, SYN-ACK arrives in
| between and select succeeds, RST arrives, application calls accept.

  yup, but these abnormal cases were all handled correctly, and we still
  got bogus return values from select.

| Do multiple processes/threads write to this socket?

  there's only one Linux process/thread, but within the Allegro CL process,
  multiple Lisp processes talk to these sockets.  however, only one process
  does listen, read, or write on any given socket at any given time.
  (separate Lisp processes take care of input and output, though.)

| One common bug that may cause it is that Linux select differes from BSD
| select in a critical point: Linux select modifies the passed timeval to
| the time left after select finished, BSD leaves it alone.  A lot of
| applications forget to reinitialize the timeout before every select call
| in their main loop.  You can check for this situation simply with strace.
| Of course there should be no bit set then in the output fd_sets in this
| case.

  yes, this possibility has been investigated (with strace as you suggest)
  and found not to apply.  timeouts cause the sets to be cleared on return,
  as expected.  the error does _not_ occur while tracing -- the system has
  to be quiet in some weird way for this to happen.  that's why it took so
  long to figure it out, and while I'm not sure select is the real culprit
  and I may only have cured a symptom.

  while I would appreciate any help in this matter, I also feel somewhat
  exhausted by it and I'm unhappy to go over the details yet again.  it
  also doesn't appear to be comp.lang.lisp material.  when I feel up to it,
  and I'm able to reproduce it consistently, whatever that means in a case
  like this, I'll try and work with both Franz Inc and Linux developers to
  see how these things interact so unreliably.

  I regret that I'm not at liberty to share the code modified code with
  other than Allegro CL licensees, and I'm letting Franz Inc engineers take
  care of any other customers who might report similar problems.  at least
  we know that this can happen and we know a way that appears to circumvent
  the problem that doesn't break when the real cause is found.

  I've said this before, but one of the more bizarre things about this
  whole experience is that I have _more_ low-level control over things in
  Allegro CL than I would have in C.  in C I can trace the system calls,
  but in Allegro CL I can trace nigh everything, and peeking under the hood
  has never been easier.  this did come as somewhat of a revelation to me.

#:Erik