Note: Dean wrote the initial version of this with input from various other folks. RST has since mailed Dean and convinced Dean that a lot of this is way overboard... and some day when Dean has a chance those extra arguments will appear here. But for now, here's a start on this which is way overboard.
Start by reading the JAWS papers. They provide a good background for what we need to think about.
Apache's process models are currently implemented by
http_main.c
. The addition of the WIN32 code has made it almost
unreadable. It needs abstraction and cleanup. Specifically,
http_main
provides the following functionality:
Note that the beauty of Apache's current design is that outside of
http_main
very little code cares about how requests are
dispatched. The module API has also been designed such that essentially
no synchronization needs to be done between running requests, for
example there are (almost) no data structures that require mutexes.
But the latter is not something that we absolutely need to maintain,
because there are optimizations that set up data structures which require
some synchronization... and opening up these potentials is one of the
purposes of this paper.
The models vary in three dimensions: number of processes, number of threads within each process, number of fibers within each thread. In all models, a fiber is used to handle each request -- that is, most code in the server can treat the fiber as the only unit of execution worth worrying about.
I believe the following models are "interesting":
Note that any single process model is subject to robustness issues depending on whether a single thread is terminated or an entire process is terminated when segfaults and such occur. So the MMS and MMM models are also interesting, but shouldn't be much more work than SMS, SMM respectively given that we already need to support MSS and MSM.
In order to abstract fibers so that we can implement all of the above models easily we need to abstract all I/O. That is, we cannot allow code outside of the process model code itself to make any system calls. As mentioned, the majority of the code in the server will only need to know that it is running in a fiber, and not worry how that fiber gets time on a CPU. In the multi-fiber models it would be death if one fiber was able to block a thread for an indeterminate time due to I/O (fibers can be compute intensive and cause problems -- but this is a separate issue solved by a yield primitive). This is why we have to abstract I/O -- for example, in the MSM model all I/O will be done using non-blocking file descriptors, and the fiber will be suspended until a select() mask indicates that the event is ready.
Given the already existing abstractions (BUFF and the rput/rprintf/etc calls) this isn't too much different than what we have now. Some specific things that need work:
We need to specify this abstraction. It should be as simple as
saying "use apio_open
" instead of "open
",
"apio_read
" instead of "read
", and so on.
The apio_*
functions have POSIX semantics. (Note that
we can't use aio_*
because this is the prefix for POSIX
asynchronous I/O functions.)
The MSS and MMS models have the easiest fiber implementation: a fiber is the same as a thread. In both of these models, the entire apio abstraction can consist of #define wrappers for the real POSIX functions.
In the MSM model we need to build an extensive apio library, and a fiber implementation. This is a well studied problem though, "green threads" used by Sun's Java implementation is one example.
A fiber's context consists of: its stack, its current pc (program counter), and its signal context (what handlers are registered, blocked signals, etc.). We don't need to worry about what its registers and such are because all context switching is done via function calls and the C compiler will take care of those details for us. An easy way to implement the pc/stack context is to use setjmp/longjmp. Where available sigaltstack/sigstack provide an interesting method (thanks to RST for this one). setjmp/longjmp actually save more context than necessary, but we can provide hand-tuned assembly for specific platforms (i.e. i386).
The apio library makes sure that all I/O is done non-blocking. When an I/O request is made it is delayed until a particular fd is ready for read or write. The main scheduling loop for fibers builds a select() call, and issues it. Then parses the results to figure out which fiber to schedule next.
The server goes through the following initialization steps:
Note: There are other API phases that we need to consider in 2.0, but I don't think they're related to the process model discussion.
Note: each phase has a pool associated with it. For example, the pool used for sub_process_initialization will be cleaned up when the sub-process dies. In this manner modules can register cleanups with the correct scope without us needing to explicitly define cleanup phases.
Some code and optimizations require the use of shared memory and mutexes. We already have several methods of doing each in the MSS model, they need to be abstracted. It should be possible to allocate shared memory and mutexes in any pool. However in the MSS and MSM models it's imperitive that they occur during main_process_init. So modules written using shared memory or mutexes that expect maximum portability will have to allocate during main_process_init. This is not an absolute requirement, modules may only work in certain models, or may only have some features in certain models.
To make this possible we will provide compile time macros to determine the process model. That is, even though on some systems we will have the option of building multiple process models, we will require the model to be a compile time decision, and we will allow modules to be dependant on the model they were compiled in.
We define:
And code can conditionally test things:#define APPM_MODEL (APPM_MULTI_PROCESS | APPM_MULTI_FIBER)
#if APPM_MODEL & APPM_MULTI_PROCESS /* do something specific to multi process models */ #endif #if APPM_MODEL == (APPM_MULTI_PROCESS|APPM_MULTI_FIBER) /* do something specific to MSM */ #endif
Hidden somewhere in the process model is the method by which a
request is received and then dispatched into a fiber running inside
some thread inside some process. There is currently no API to
tap into this, and because Apache is only an HTTP server it hasn't
been an issue. The closest thing to an API to tap into it is the
STANDALONE_MAIN
definition. But to use that the user
must supply a complete standalone_main()
replacement, and
use a different child_main
(i.e. must supply a complete
process model replacement). This won't be feasible after the process
model has been abstracted because there will be too many models to try
to re-implement.
Other than the process model itself, standalone_main/child_main provide the following services:
other_child
(the
register_other_child
API)
It should be easy to abstract the network functions. In the Unix
models it's sufficient to supply a few fd_sets
for
select
, a void *
, and a callback. TODO:
need to figure out the cleanest way to do this so that the WIN32 and
OS2 models can implement it well. Oh yeah, consider poll()
while you're at it, because poll works better on sparse fd_sets.
The scoreboard is very intimately tied to the process model. In theory some models may be able to run without a scoreboard at all. For example an all static content site running in MSM or SMM models should never need to spawn or destroy any threads/processes. Such a server would spawn enough threads/processes initially to exploit the parallelism in the hardware, and no more. In general there is likely to always be a scoreboard, but its use and contents may vary widely.
The scoreboard currently provides the following:
I propose that the first function will have to be supplied by a module which is process model dependant, and which uses unexported interfaces to the scorebard. The second function is part of the process model itself, and is hidden behind the timeout API already.
The third function can be provided by an API like this:
typedef enum { REQSTAT_MILLISECONDS /* #msecs request took, unsigned long */ /* uhhh what other fields are of use to a module?? */ } reqstat_t; /* returns 0, and stores result in *result if successful * returns -1 and sets errno in the event of an error. * possible errno values: EINVAL -- not supported in this process model */ extern int get_reqstat(request_rec *r, reqstat_t index, void *result);
I think the above is general enough to implement the interesting process models, and to implement optimizations that are available only in some of the multi-threaded models. Note that nothing above is specific to HTTP, and I believe that we should strive to keep the abstraction so that the same libraries can be used to implement other types of servers (i.e. FTP, streaming video/audio, corba).
For many systems there will be multiple process model options. It's hard to say which one will be "best"... because some modules may work out better in different models. This will complicate binary distribution of the server. The compile time model choice should be made in the Configuration file. If possible we want to avoid code duplication, so the os/osname/ directories are probably not where the models should be implemented. We should probably have pm/unix-mss, pm/unix-msm, pm/win32-sms, etc.
Ben Hyde adds: "I always end up adding a priority scheme of some sort, and I find it is best if the priority is on the transaction and not on the thread. I don't know how many systems I've had to rework that in."
Note that it's possible to implement an MSM model in which fibers can migrate from process to process by using a lot of shared memory, and file handle passing. Maybe this is a good thing, maybe not. I don't think anything above makes this impossible... but the MSM notation isn't specific enough to distinguish the two different models.
The whole reason to consider process models is for performance, but performance has to be balanced with portability. I'm confident that with this design we should be able to implement the highest performing models with very little overhead, and not degrade performance on the lower models at all.
This design can be fleshed out and implemented somewhat "organically". Several people can be working on it at the same time because there are many models that need to be implemented. Hopefully during this process we'll be able to hash out the actual API details. I hope the above is enough to get people started.
Implementing Lightweight Threads, D. Stein, D. Shah, SunSoft Inc. (postscript).