Subject: Re: Politeness and language growth From: Erik Naggum <erik@naggum.no> Date: 1999/01/09 Newsgroups: comp.lang.lisp Message-ID: <3124911593956744@naggum.no> * Andi Kleen <ak-uu@muc.de> | Sorry, I meant RST instead of ICMP of course. OK. | There are possible scenarios, e.g. when the final ACK is delayed, the | SYN-ACK is retransmitted, RST is send for the SYN-ACK, SYN-ACK arrives in | between and select succeeds, RST arrives, application calls accept. yup, but these abnormal cases were all handled correctly, and we still got bogus return values from select. | Do multiple processes/threads write to this socket? there's only one Linux process/thread, but within the Allegro CL process, multiple Lisp processes talk to these sockets. however, only one process does listen, read, or write on any given socket at any given time. (separate Lisp processes take care of input and output, though.) | One common bug that may cause it is that Linux select differes from BSD | select in a critical point: Linux select modifies the passed timeval to | the time left after select finished, BSD leaves it alone. A lot of | applications forget to reinitialize the timeout before every select call | in their main loop. You can check for this situation simply with strace. | Of course there should be no bit set then in the output fd_sets in this | case. yes, this possibility has been investigated (with strace as you suggest) and found not to apply. timeouts cause the sets to be cleared on return, as expected. the error does _not_ occur while tracing -- the system has to be quiet in some weird way for this to happen. that's why it took so long to figure it out, and while I'm not sure select is the real culprit and I may only have cured a symptom. while I would appreciate any help in this matter, I also feel somewhat exhausted by it and I'm unhappy to go over the details yet again. it also doesn't appear to be comp.lang.lisp material. when I feel up to it, and I'm able to reproduce it consistently, whatever that means in a case like this, I'll try and work with both Franz Inc and Linux developers to see how these things interact so unreliably. I regret that I'm not at liberty to share the code modified code with other than Allegro CL licensees, and I'm letting Franz Inc engineers take care of any other customers who might report similar problems. at least we know that this can happen and we know a way that appears to circumvent the problem that doesn't break when the real cause is found. I've said this before, but one of the more bizarre things about this whole experience is that I have _more_ low-level control over things in Allegro CL than I would have in C. in C I can trace the system calls, but in Allegro CL I can trace nigh everything, and peeking under the hood has never been easier. this did come as somewhat of a revelation to me. #:Erik