Subject: Re: Series. (was Re: About loops)
From: rpw3@rigden.engr.sgi.com (Rob Warnock)
Date: 1999/04/06
Newsgroups: comp.lang.lisp
Message-ID: <7eciqk$3ihf@fido.engr.sgi.com>
[This is getting pretty far off-topic, so maybe after this we should
go private...]

<luvisi@andru.sonoma.edu> wrote:
+---------------
| rpw3@rigden.engr.sgi.com (Rob Warnock) writes:
| > And on a few architectures & C compilers [I mention both because both
| > matter], the C "setjmp()/longjmp()" can be used with no additional
| > assembly-language code to build coroutines [Unix kernel hackers should
| > be mumbling "resume()" right about now].
| 
| could I trouble you to outline how one would go about doing this?
| in particular, how does one create a second stack which would
| automatically grow like the main one does?
+---------------

Those are two separate questions, the first easy and the second much harder.

1. Using C's setjmp/longjmp to build coroutines:

   void resume(proc_p p)
   {
	extern proc_p self;	/* the currently running thread */
        jmp_buf regs;

        if (setjmp(regs))
            return; /* we've just been resume'd again */
        self->save_area = (long *)&regs[0];     /* save pointer to our state */
        self = p;                               /* switch identities */
        longjmp(*(jmp_buf*)self->save_area, 1); /* wake our new self */
        /*NOTREACHED*/
   }

   [Note that resume() does *not* save the old "self" anywhere, so the caller
   of resume() must do so, usually on a run queue or wait queue or something.]

   Obviously, this only works the first time if somebody has created the
   new coroutine (a.k.a. thread or process) object's stack and set it up so
   that it looks like a setjmp() was previously done. A routine to do that:
     proc_p
     new_process(int stack_size, char name[], void (*f)(), int arg2, ...)
   is a bit more complex than "resume()", so I won't show the whole thing
   here, but basically it:
   a. Malloc's some space for a new stack;
   b. Copies its args into the new stack (for use when calling "f" the
      first time, passing "arg2" and some number of further ones).
   c. Does a setjmp() into the new stack, then diddles with the saved
      value of the stack pointer so that a longjmp() will use the new
      stack (with the args "in the right place").
   d. Returns the constructed stack object (== thread == coroutine).
   e. When resume()'d the very first time, calls "f(arg2, ...)".

   On top of new_process() and resume(), one can build [and I have done]
   whatever wait & run queues you like. For Simula-like (or these days,
   we might say "Verilog-like") simulations, one of the most useful things
   is a priority queue for which the key is "virtual time to start" (much
   like the old Unix kernel "timeout" facility), and a "busy(n)" which
   blocks the current thread until the virtual time in now+n ticks. With
   that, plus simple wait-for-event queues, you can build a complete
   discrete event simulator.

2. Creating multiple stacks that grow automatically: This ranges from
   "trivial" (when using call/cc in Scheme) to "impossible" (running on
   systems without multiple virtual memory segments). The above-mentioned
   coroutine library set a *fixed* stack size at coroutine-creation time
   [see the first arg to "new_process()"], and it was useful for a wide
   variety of applications. But it was possible (too easy, even) to overflow
   one of the stacks without warning, unless you added some safety code.

   The main ways I know of to get multiple independently-growing stacks are:
   a. Change the coroutine call so that the currently running thread doesn't
      run *on* the stack object, it runs on the "main" stack. This involves
      copying the main stack out into the heap & back on every resume()
      [and allocating a new larger heap object for a thread's stack whenever
      the old one isn't large enough to save a stack being "swapped out"].
      This can be cheap or ex*PEN*sive, depending on average stack depth.
      [But note that many Scheme implementations that use the C stack as
      the Scheme execution stack do precisely this to implement call/cc.]
   b. Change the way your compiler(s) work to use "segmented" (or "cactus")
      stacks. Not an option for most people.
   c. Write in a language that uses a heap-based stack (e.g. some Schemes
      or Lisps, and maybe at least one ML?), so that cactus stacks are "free".
   d. Play some tricky games (if your O/S lets you) with "mmap()", to allocate
      larger-than-you'll-ever-need stacks sparsely in your virtual address
      space. Some operating systems implement "allocate on first reference"
      pages which are useful for this.

    Like I said, it's harder & more environment-dependent than simple
    coroutines with fixed-size stacks. (Sorry 'bout that...)


-Rob

-----
Rob Warnock, 8L-855		rpw3@sgi.com
Applied Networking		http://reality.sgi.com/rpw3/
Silicon Graphics, Inc.		Phone: 650-933-1673
2011 N. Shoreline Blvd.		FAX: 650-964-0811
Mountain View, CA  94043	PP-ASEL-IA