Monday, March 14, 2005

Tomcat and concurrent access to unsynchronised data

If I feel sufficiently motivated at some point to actually fix this bug, I'll chase it through the relevant mailing list. At present, I'm just recording the problem for posterity.

  • Tomcat wraps servlets in StandardWrapper

  • Tomcat has a single HttpConnector which accepts connections and hands them off to one HttpProcessor instance each; the processor will return itself to the available pool once the connection is terminated.

  • HttpProcessor instances allocate servlets to themselves for the duration of a request (which may be the entire duration of a connection, or a connection may have many requests) and then deallocates them when finished.

  • There is a lot of special-casing for SingleThreadMode (STM) servlets which cannot be allocated to more than one HttpProcessor at a time.

  • This is controlled, in part, with a private field on StandardWrapper called countAllocated.

  • Ingeneously (or not...) updates to countAllocated are not synchronised and, it appears, are trampling each other (from using jdb to watch changes to countAllocated):
    Field (org.apache.catalina.core.StandardWrapper.countAllocated) is
    1, will be 1: thread="HttpProcessor[8180][2]", org.apache.catalina
    .core.StandardWrapper.allocate(), line=669, bci=138

    The nearest relevant source line in StandardWrapper is:

  • My hypothesis is that this can only choke if two threads are trying to alter the value concurrently; in this case two are trying to set it to 1 when the nett result should be 2 (whether the conflicting operation is an increment from 0 to 1 or a decrement from 2 to 1, the addition of the failed increment should still be 2, not 1, thus the odd message from jdb).

  • This in itself would not ordinarily be a concern when not using STM (and in fact there is synchronised pool management for STM), BUT, in the orderly shutdown process, StandardWrapper.unload() waits until a servlet is no longer allocated (calculated by determining whether countAllocated is zero) which is intermittently not occuring (presumably because in some cases, more decrements than increments are lost) and consequently tomcat is wedging half shut down.

The unpleasant conclusion is that, short of maintaining a custom tomcat build, it is neccessary to have a wrapper script revert to "kill -9" when orderly shutdown fails.