=============================
Request Pipeline And Timeouts
=============================

A request handled by mod_wsgi traverses several stages between
arriving at Apache and reaching the WSGI application. Each
stage has its own timeout knob, its own failure mode, and its
own recovery flow. This page walks the pipeline so each
timeout lands in context, then covers the recovery flows when
one fires.

The :doc:`embedded-and-daemon-mode` guide is the structural companion: it
covers process and thread sizing, recycling triggers, and the
process-group patterns that this page builds on. Read that
page first if the daemon-mode model is unfamiliar.

The timeout options split into three groups:

* **Transport.** ``connect-timeout``, ``queue-timeout``,
  ``socket-timeout``, ``response-socket-timeout``. These
  govern the boundary between the Apache child process and
  the daemon process, and between the Apache child and the
  HTTP client.
* **Application fail-safe.** ``request-timeout``,
  ``interrupt-timeout``, ``deadlock-timeout``. These detect
  when the WSGI application has stopped making progress and
  trigger recovery.
* **Lifecycle.** ``startup-timeout``, ``inactivity-timeout``,
  ``graceful-timeout``, ``eviction-timeout``,
  ``shutdown-timeout``. These govern the daemon process's
  own lifecycle: startup, idle recycling, drain on restart,
  and hard cutoff on shutdown.

The full reference for each option is on the
:doc:`../configuration-directives/WSGIDaemonProcess` page;
this guide covers the model and the interactions.

The request pipeline (daemon mode)
----------------------------------

What follows is the path a single request takes from arrival
at Apache through to the WSGI application and back. Each
stage calls out the timeout knobs that apply to it.

Embedded mode is structurally different (no socket hop, no
daemon-side queue, no per-process recycle) and is described
in its own section after the daemon-mode walkthrough.

Apache accept and request parsing
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The request first hits Apache. Apache's own ``Timeout`` and
``KeepAliveTimeout`` directives govern this stage: how long
Apache will wait for headers, body, or the next keep-alive
request on a connection. These are Apache concerns rather
than mod_wsgi concerns and are out of scope for this page;
consult the Apache HTTP Server documentation for the per-MPM
behaviour.

Once Apache has parsed the request and decided to dispatch it
through mod_wsgi (matching a ``WSGIScriptAlias`` or a
``SetHandler wsgi-script``), the mod_wsgi handler is invoked
inside the Apache child worker process.

Auth scripts and ``WSGIDispatchScript``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If ``WSGIAuthUserScript``, ``WSGIAuthGroupScript``,
``WSGIAccessScript``, or ``WSGIDispatchScript`` is configured,
the corresponding script runs inside the Apache child process,
in an embedded Python interpreter, *before* the request is
delegated to the daemon. This is independent of whether the
WSGI application itself runs embedded or in a daemon process
group.

``WSGIDispatchScript`` is the more consequential one for the
pipeline: its ``process_group(environ)`` callable is what
determines which daemon process group the request is
delegated to. Its ``application_group(environ)`` and
``callable_object(environ)`` callables similarly override the
sub interpreter and entry point on a per-request basis. See
:doc:`../configuration-directives/WSGIDispatchScript` for the
directive reference.

These embedded-mode scripts have no dedicated timeout. They
run under whatever request-handling envelope Apache itself
applies (``Timeout``), bounded only by Apache's own request
processing.

Apache child connects to the daemon listener
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Once routing is settled and the request is bound for a daemon
process group, the Apache child connects to that group's UNIX
domain listener socket.

A daemon process group has a single listener socket shared by
all the daemon processes in the group. The kernel
load-balances connection acceptance across whichever daemon
processes have idle worker capacity.

If the kernel listen queue is full (``listen-backlog``
exceeded) the connect will fail. mod_wsgi retries with
backoff, starting at fractional-second intervals and
stretching out to one-second intervals after a couple of
seconds of accumulated wait. The overall budget for retries
is ``connect-timeout`` (default 15 seconds). On exhaustion
the request is failed with HTTP 503.

A connect that fails for permission or filesystem reasons
(the socket file does not exist, the Apache child user
cannot traverse to its directory) also fails immediately
with HTTP 503 and an error logged. This is most often a
``WSGISocketPrefix`` configuration issue; see
:doc:`../configuration-directives/WSGISocketPrefix`.

``connect-timeout`` is the only knob that applies at this
stage.

Waiting for a daemon worker to pick up the request
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Once the Apache child has connected to the daemon process
group's listener socket, the connection sits in the kernel's
listen-backlog queue until a daemon worker thread is free to
``accept()`` it. A daemon process only accepts a new
connection when it has an idle worker thread ready to handle
it; if every worker thread across every process in the group
is busy, incoming connections accumulate in the kernel listen
queue.

``queue-timeout`` is not enforced while the request is
waiting. No mod_wsgi code is running on the request while
it sits in the kernel listen queue, so nothing fires a
timer to abandon it: the Apache child is blocked reading
the response from the daemon, and from the client's
perspective the request is simply slow.

The check happens later, at the moment a worker thread
finally accepts the connection and reads the request
envelope. The envelope carries the timestamp at which the
Apache child first wrote it; the worker compares that
against the current time. If the wait exceeds
``queue-timeout``, the worker discards the request without
dispatching it to the WSGI application and HTTP 504 is
returned to the client.

The effect is load-shedding on pickup. Under overload, when
workers free up to work through the backlog, anything that
has been waiting longer than ``queue-timeout`` is dropped
rather than served, so workers spend their time on fresh
work rather than on requests whose clients have likely
already given up. A request can sit in the queue for
considerably longer than ``queue-timeout`` before being
abandoned, since the discard fires at pickup rather than at
the timeout instant.

``listen-backlog`` (default 100) caps the kernel-level
queue of unaccepted connections waiting on the listener
socket. Under sustained overload it fills first, after
which further connection attempts start failing at the
kernel and the Apache child falls into
``connect-timeout``-bounded retries. ``queue-timeout`` and
``listen-backlog`` work together: the backlog provides the
buffer, and the timeout decides which work is still worth
serving when worker capacity returns.

Daemon worker reads the request
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Once a daemon worker thread is handling the connection, it
reads the request envelope (headers and body) over the
socket from the Apache child. Each individual read or write
on this socket is bounded by ``socket-timeout``, which falls
back to Apache's ``Timeout`` directive when not set
explicitly.

This timeout exists to bound the time a daemon worker
spends waiting on a slow Apache child (or a misbehaving
connection) during the request hand-off. It is a per-syscall
timeout, not a total-request timeout.

WSGI script reload during a request
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

When ``WSGIScriptReloading`` is on (the default), each
daemon worker checks the WSGI script file's modification
time on every request dispatch. If the file has changed
since the daemon loaded it, the daemon does not serve the
request: it rejects it back to the Apache child and
initiates its own restart so it can reload the script from
disk.

The Apache child treats the rejection as a signal to close
the socket and reconnect. The reconnect lands on whichever
daemon process in the pool is ready to accept; the kernel
load-balances among them. In a multi-process pool, each
old daemon needs only the first request after the script
change to trigger its own restart, so concurrent requests
share the work: while this request is reconnecting,
sibling requests in flight may already have triggered some
of the other daemons' restarts. A given request does not
necessarily walk the whole pool. The Apache child
reconnects until its request lands on a process that has
finished reloading.

Reconnect attempts are bounded by a retry cap proportional
to the pool size: roughly ``2 * processes + 1`` attempts.
In practice this cap is never hit by reload-driven
restarts because the restart cycle completes before the
cap is exhausted. Each reconnect goes through the full
``connect-timeout`` window, so the worst-case total
reconnect wait is roughly ``connect-timeout * retry-cap``;
the kernel load-balance and the speed of the actual
restart keep this much shorter in practice.

No operator-visible timeout knob governs the reload-driven
reconnect. Setting ``WSGIScriptReloading Off`` disables the
modification check and removes the mechanism entirely; with
it disabled, the daemon runs the script as loaded at
process startup until some other trigger recycles the
process. See :doc:`reloading-source-code` for the broader
reload model.

WSGI application runs
~~~~~~~~~~~~~~~~~~~~~

Once the request is in the daemon worker thread, the worker
calls into the WSGI application. This is where most of the
useful work happens, and where the application fail-safe
timeouts apply.

``request-timeout`` is a per-thread upper bound on how long a
single request can spend running before mod_wsgi treats it
as wedged and triggers recovery. Defaults to 0 (disabled).
The fire point is not the configured value directly; it
scales with ``threads`` by natural log:

.. code-block:: text

    T_fire = request-timeout * (1 + ln(threads))

At ``threads=1`` this collapses to ``request-timeout``. At
``threads=10`` it is approximately 3.3x; at ``threads=25``
approximately 4.2x. The intent is to grant proportionally
more patience as parallel capacity grows: a wedge in 1-of-10
threads costs less than a wedge in 1-of-1, so the threshold
should grow with pool size, but only sub-linearly.

Each thread is judged independently against this threshold.
Multiple wedged threads are detected on the same schedule a
single wedge would be.

``request-timeout`` is a *fail-safe*, not a per-request SLA
mechanism. See :ref:`request-timeout-not-sla` below for the
distinction and the right tool for user-visible deadlines.

What happens when ``request-timeout`` fires depends on
``interrupt-timeout``, covered in the recovery-flow section
below.

C extension wedges the GIL
~~~~~~~~~~~~~~~~~~~~~~~~~~

A separate failure mode from a wedged request is a wedged
*interpreter*. If a Python C extension fails to release the
GIL inside a long-running operation, no Python code in the
process can run. Other worker threads in the same process
are also blocked.

``deadlock-timeout`` (default 300 seconds) detects this case.
A monitor thread inside the daemon attempts to acquire the
GIL once per second; when the acquisition itself blocks for
longer than ``deadlock-timeout``, the daemon is treated as
wedged and recycled.

The injection mechanism that ``request-timeout`` and
``interrupt-timeout`` use cannot recover this case: it
relies on Python being able to run. ``deadlock-timeout``
handles it the only way that works, which is process
restart. See the recovery-flow section below.

Response back to the client
~~~~~~~~~~~~~~~~~~~~~~~~~~~

Once the WSGI application has produced a response, the
daemon streams it back over the socket to the Apache child,
which then proxies it through to the HTTP client.

The daemon-to-Apache leg uses ``socket-timeout`` (the same
timeout that bounded request reads). The Apache-to-client
leg, when the response buffer has filled and a forced flush
has to wait on the client, uses ``response-socket-timeout``.
This defaults to the value of ``socket-timeout`` when not
set explicitly.

``response-socket-timeout`` is the knob to reach for when
serving slow clients (mobile networks, satellite, etc.) and
the application has produced a large response that does not
fit in a single buffer flush. A short value here can clip
genuine slow-client traffic; a long value lets a slow
client hold a daemon worker thread for the duration of the
response.

WSGI script load
~~~~~~~~~~~~~~~~

A daemon process must load and execute its WSGI script
before it can serve any request. ``startup-timeout``
(default 0, disabled) bounds how long a daemon process is
allowed to spend on this initial load. When set and the
load takes longer than the limit, the daemon process is
restarted.

The case ``startup-timeout`` was introduced for is transient
import failures that leave Python module-level state partly
initialised: a subsequent retry of the same import in the
same process can hit a different failure than the first
attempt. Django is the prominent example; once Django's
bootstrap has been started in a process it cannot be
cleanly retried. ``startup-timeout`` forces a fresh process
so the retry starts from a clean slate.

Idle daemon
~~~~~~~~~~~

When a daemon process has no active requests and is not
receiving new ones, ``inactivity-timeout`` (default 0,
disabled) recycles it after the configured idle interval.
The intent is to reclaim memory from infrequently-used
daemon process groups.

The first request to arrive after an idle recycle pays the
import cost again. For applications with a high startup
cost (large model load, complex framework bootstrap) this
can be a visible per-request latency spike on cold paths.
``inactivity-timeout`` is most useful for genuinely
infrequently-used groups (administrative endpoints,
periodic batch jobs); it is rarely the right knob for
production request-handling pools.

Embedded mode: what is different
--------------------------------

When a WSGI application runs in embedded mode (no
``WSGIDaemonProcess`` declaration, or
``WSGIProcessGroup %{GLOBAL}`` selecting embedded
explicitly) the pipeline is much shorter, and most of the
timeout knobs above do not apply.

The application runs directly inside the Apache child
worker process. There is no UNIX socket between Apache and
the application: the WSGI handler is invoked in-process,
the application returns its response, and Apache's normal
output machinery streams that back to the client.

What that means for timeouts:

* **Transport timeouts vanish.** No ``connect-timeout``, no
  ``queue-timeout``, no ``socket-timeout``, no
  ``response-socket-timeout``. The daemon-side hops they
  guard do not exist.
* **Apache's own request timeouts still apply.**
  ``Timeout`` and ``KeepAliveTimeout`` are the only
  timeouts that govern the request itself in embedded
  mode.
* **Dispatch and auth scripts still run** in the same
  Python interpreter that ends up running the request
  handler. ``WSGIDispatchScript`` and the auth-script
  directives operate the same way as in daemon mode;
  there is no extra process boundary.
* **No per-process recycle from mod_wsgi.** No
  ``maximum-requests``, ``restart-interval``,
  ``cpu-time-limit``, ``inactivity-timeout``. Apache's MPM
  (``MaxConnectionsPerChild``, ``MaxRequestWorkers``,
  ``ServerLimit``, etc.) is what decides when an Apache
  child is recycled, and that is governed by Apache rather
  than mod_wsgi.
* **No application fail-safe timeouts.**
  ``request-timeout``, ``interrupt-timeout``, and
  ``deadlock-timeout`` are not available. mod_wsgi cannot
  kill an Apache child mid-request without taking the rest
  of the child's modules down with it (``mod_php``,
  ``mod_ssl``, static-file serving, and so on). A wedged
  request in embedded mode wedges the Apache worker until
  Apache itself decides the worker has misbehaved.
* **No drain or shutdown timeouts.** ``graceful-timeout``,
  ``eviction-timeout``, and ``shutdown-timeout`` likewise
  do not apply, for the same reason: process lifecycle
  belongs to Apache, not mod_wsgi.

Net effect: the embedded-mode timeout surface reduces to
"Apache's ``Timeout`` directive, plus the MPM tuning". A
wedged request, a runaway request, or a deadlocked C
extension cannot be recovered automatically; the operator
sees the symptom (Apache children running out, latency
climbing) and has to intervene.

On Windows this is not a deployment choice. Daemon mode is
not available there, so embedded is the only option. The
"no automatic recovery from a wedged request" property is
therefore an inherent property of mod_wsgi on Windows, not
a trade-off the operator selected. See
:doc:`processes-and-threading` for the Windows process
model and its implications.

For any deployment where daemon mode is available, prefer
it. See :doc:`embedded-and-daemon-mode` for the model and patterns.

Recovery flow when ``request-timeout`` fires
--------------------------------------------

When the per-thread fire point is crossed, what happens
next depends entirely on ``interrupt-timeout``.

With ``interrupt-timeout=0`` (default)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

mod_wsgi skips thread-local injection and transitions the
process directly into ``graceful-timeout`` followed by
``shutdown-timeout``. The whole daemon process is recycled.
Sibling requests on other threads have at most
``graceful-timeout`` to finish cleanly before the process
is forcibly shut down.

This is the simplest case but the most disruptive: one
wedged request takes out the entire daemon process and any
other in-flight requests on its threads.

With ``interrupt-timeout`` set to a non-zero value
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

mod_wsgi attempts to interrupt only the wedged thread by
injecting a :py:class:`mod_wsgi.RequestTimeout` exception
into it. If the injection unwinds the wedged request within
the ``interrupt-timeout`` grace window, the worker thread
returns to the pool and the process keeps serving. The
other threads were never disturbed.

The injected exception derives directly from
``BaseException``, so well-written code using
``except Exception:`` will not catch it. It may be caught
for cleanup (closing connections, releasing locks) but
should be re-raised; swallowing it defeats the recovery
mechanism.

If the injected exception unwinds back to the WSGI adapter
within the grace window, the adapter returns
``504 Gateway Timeout`` and the request is logged as having
been recovered.

Three thread states determine whether injection works
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Thread injection is best-effort. The injection takes effect
only when the target thread next runs Python bytecode,
which means the thread's current state determines the
outcome:

* **Running Python code** (loops, computation, framework
  code). The exception fires on the next bytecode tick,
  almost immediately. This is the case the mechanism is
  designed for.

* **Blocked in a C call that has released the GIL** (most
  socket reads, database driver calls, ``time.sleep``,
  file I/O). The injected exception is queued on the
  thread but does not fire until the blocking call returns
  and Python bytecode runs again. If the external service
  eventually responds or times out at its own protocol
  level, the injected exception fires then. If the
  blocking call hangs indefinitely with no internal
  timeout, the injected exception will never fire. In
  that case the ``interrupt-timeout`` grace window
  expires and the daemon falls through to the
  ``graceful-timeout`` / ``shutdown-timeout`` chain,
  taking the wedged request down with the process.

* **Blocked in a C extension that holds the GIL.** No
  Python code can run anywhere in the process. The
  injection cannot reach the thread; no other Python
  thread can run either. This is what ``deadlock-timeout``
  exists for; the ``request-timeout`` /
  ``interrupt-timeout`` mechanism cannot help.

The takeaway for sizing: ``interrupt-timeout`` works
cleanly when the application's blocking calls have their
own finite timeouts (HTTP client read timeouts, database
statement timeouts, and so on) so the indefinite-block
case does not arise. See
:ref:`request-timeout-not-sla` below.

Multiple wedges in flight
~~~~~~~~~~~~~~~~~~~~~~~~~

When several threads wedge in quick succession, each gets
its own injection on its own grace timer. The first
injection grace to expire arms ``graceful-timeout``;
sibling injections continue to tick on their own threads
and may still unwind cleanly during the graceful window,
in which case those threads free up and the drain check
progresses.

The ``graceful-timeout`` "stale request" optimisation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Once ``graceful-timeout`` is armed, the drain check ignores
any in-flight request whose elapsed time has already
exceeded ``request-timeout + interrupt-timeout``. Such a
request will not unwind voluntarily, so waiting for it
serves no purpose. This lets the graceful drain complete
promptly when a wedged thread is the only thing still
tying up the process: the sibling requests get the chance
to finish cleanly while the wedged one rides out via
``shutdown-timeout``'s forced kill.

Recovery flow when ``deadlock-timeout`` fires
---------------------------------------------

When the GIL is wedged inside a Python C extension, the
injection mechanism above cannot help: no Python bytecode
is running anywhere in the process, so a queued injection
has no trigger.

``deadlock-timeout`` recovers the process the only way it
can: a forced restart. The detection thread signals
shutdown, the shutdown sequence begins, and because the
in-flight requests cannot unwind, the process eventually
exits via ``shutdown-timeout``'s forced kill rather than
via a graceful drain. Any request that was being processed
at the time is lost.

This is more disruptive than the ``request-timeout`` case
because there is no thread-local recovery option, and
because the failure mode usually indicates a bug in a C
extension rather than a wedged application path. The
remediation is typically not to tune the timeout; it is to
find which C extension is misbehaving and stop using it
(or fix it).

See :doc:`embedded-and-daemon-mode` for the structural overview of how
this fits with the other recycling triggers.

Recovery flow on graceful restart (SIGUSR1)
-------------------------------------------

When an operator sends ``SIGUSR1`` directly to a daemon
process (for example with ``pkill -USR1 -f 'wsgi:groupname'``),
the daemon process drains rather than restarting
immediately.

Note that ``apachectl graceful`` does *not* take this path.
The Apache parent forwards ``SIGTERM`` to mod_wsgi daemon
processes even on a graceful restart of the server, so the
daemon goes straight through ``shutdown-timeout`` rather
than the ``eviction-timeout`` / ``graceful-timeout`` drain.
The graceful-restart signal handling described here applies
only to ``SIGUSR1`` arriving directly at the daemon
process.

If ``eviction-timeout`` is set, the daemon continues to
accept new requests for that many seconds, drains
in-flight work, and restarts as soon as it reaches an idle
state (or once ``eviction-timeout`` expires, whichever
comes first).

If ``eviction-timeout`` is not set, the daemon falls back
to ``graceful-timeout`` for the same purpose. If neither is
set, the daemon restarts immediately, which means any
in-flight requests are killed by ``shutdown-timeout``.

This is the path used by the blue/green cutover pattern in
:doc:`upgrading-an-application`; the longer drain window is
what makes the cutover graceful for in-flight requests.

Drain semantics during shutdown
-------------------------------

Across the recovery flows above, ``graceful-timeout`` and
``eviction-timeout`` are described as "drain" windows. The
word is convenient but slightly misleading: the daemon
process is not refusing new requests during those windows.
Only ``shutdown-timeout`` does that.

During ``graceful-timeout`` and ``eviction-timeout`` the
daemon continues to accept new requests. The process is
running normally; it is just hoping to reach an idle state
through ordinary request turnover so it can exit cleanly.
If all in-flight requests finish and no new ones arrive
before the timeout expires, the process exits immediately.
If the timeout expires with requests still in flight
(because new ones kept arriving, or because in-flight ones
are slow), the process falls into ``shutdown-timeout``.

During ``shutdown-timeout`` the daemon stops accepting new
requests. The Apache child loses the ability to dispatch
fresh work to this process; in-flight requests continue
running. If the in-flight requests finish before
``shutdown-timeout`` expires, the process exits
immediately. If ``shutdown-timeout`` expires with requests
still in flight, the process is forcibly killed and those
requests are lost.

So the practical distinction is what happens to incoming
traffic. ``graceful-timeout`` and ``eviction-timeout`` do
not stop new traffic and are best understood as "keep
serving while waiting for an opportunity to exit cleanly".
``shutdown-timeout`` is the actual drain plus hard cutoff:
no new work in, in-flight work has a fixed window to
finish, then forced exit.

Sizing the timeouts
-------------------

Most of the timeout options are off by default for
backwards-compatibility reasons. The recommendation is to
set them explicitly so the daemon process group can recover
from backlogging and hung requests rather than silently
piling them up.

``mod_wsgi-express`` already does this. Its generated
configuration applies a starter set of values that has
been tuned over many deployments. These are a good
baseline for a hand-written ``WSGIDaemonProcess``
configuration::

    WSGIDaemonProcess example processes=2 threads=5 \
        display-name=%{GROUP} \
        lang=en_US.UTF-8 \
        locale=en_US.UTF-8 \
        queue-timeout=45 \
        socket-timeout=60 \
        connect-timeout=15 \
        request-timeout=60 \
        interrupt-timeout=0 \
        startup-timeout=15 \
        deadlock-timeout=60 \
        graceful-timeout=15 \
        eviction-timeout=0 \
        inactivity-timeout=0 \
        restart-interval=0 \
        shutdown-timeout=5 \
        maximum-requests=0

Adjust from there based on the application's actual
behaviour. A few specific notes:

Do not over-tighten ``request-timeout``
    The ln-scaling already provides headroom for higher
    thread counts. Setting this to a few times the p99 of
    normal request duration is typically right; setting it
    close to p99 will produce false positives on legitimate
    slow paths.

``interrupt-timeout`` has a recommended floor of about 10 seconds when enabled
    Values significantly below that may not give the injected
    exception time to unwind through finally blocks, context
    managers, and the WSGI adapter. Setting it too short can
    defeat the purpose of injection by turning recoverable
    wedges into process restarts.

``queue-timeout`` discards stale work at pickup, not while it waits
    When a worker finally accepts a request that has been
    sitting in the queue longer than this, it is discarded
    with a 504 rather than served. The 504 is not
    necessarily prompt: the request waits in the kernel
    queue until a worker frees up, and only then gets
    discarded. The right value depends on what kind of
    latency the application treats as already-failed: a
    backend serving real-time requests might set 5 to 10
    seconds; a batch-style service might tolerate 60 to
    120.

``startup-timeout`` is mostly for Django and similar frameworks
    Frameworks that cannot be cleanly re-bootstrapped in the
    same process need ``startup-timeout`` so a partial
    bootstrap forces a fresh process. If the application's
    startup is fast and deterministic, this is not a useful
    knob.

``graceful-timeout``, ``eviction-timeout``, and ``shutdown-timeout`` form a hierarchy
    ``graceful-timeout`` keeps the process accepting new
    requests while waiting for it to reach idle (used after
    recycling triggers), ``eviction-timeout`` does the same
    after a direct ``SIGUSR1`` (falling back to
    ``graceful-timeout`` when not set), and
    ``shutdown-timeout`` is the hard cutoff once shutdown is
    actually under way, with no new requests accepted. See
    the "Drain semantics during shutdown" section above for
    the full distinction. The default of 5 seconds for
    ``shutdown-timeout`` suits most workloads; too short and
    Python ``atexit`` handlers may not finish, too long and
    recovery from a wedged process is delayed.

.. _request-timeout-not-sla:

``request-timeout`` is not a per-request SLA
--------------------------------------------

A common mistake is to treat ``request-timeout`` as the
right knob for "this request must complete within N
seconds, or return an error". That is not what the
mechanism is for, and trying to use it that way will
produce surprising behaviour.

``request-timeout`` is a *process-level fail-safe*. Its
purpose is to detect when the daemon process has stopped
making progress and trigger recovery before the whole pool
becomes useless. The natural-log scaling against
``threads`` is a deliberate choice that follows from this:
a wedge in 1-of-10 threads is a smaller problem than a
wedge in 1-of-1, so the trigger should fire later in the
larger pool. A per-request SLA would not scale this way.

For user-visible per-request deadlines, use
application-level timeouts on the operations the request
performs:

* HTTP client read timeouts on outbound calls.
* Database statement timeouts (``SET statement_timeout``
  in PostgreSQL, ``MAX_EXECUTION_TIME`` in MySQL, etc.).
* Per-operation timeouts on cache, message-queue, and
  other service clients.

These bound the blocking calls inside the request handler
itself, so the request returns a finite error to the
client within the SLA. They also have a useful side effect
for ``interrupt-timeout``: once the blocking call has a
finite internal timeout, the indefinite-hang case (where
injection cannot fire) goes away, and the thread-local
recovery mechanism can do its job.

Treat ``request-timeout`` as the safety net that catches
everything application-level timeouts missed, not as the
front-line deadline.

Common pitfalls
---------------

Tight ``request-timeout`` triggering false positives
    A ``request-timeout`` set close to the p99 of legitimate
    slow requests will fire on those legitimate requests
    under ordinary load, restarting the process for no
    reason. Set it to a multiple of p99, not to p99.

Relying on ``interrupt-timeout`` without application-level timeouts
    When the wedged thread is blocked on an external service
    that itself has no timeout, the injection cannot fire.
    The grace window expires and the process is recycled the
    same as if ``interrupt-timeout`` had been zero. Adding
    finite client-side timeouts at the blocking call site is
    what makes ``interrupt-timeout`` useful.

Forgetting ``queue-timeout``
    Without ``queue-timeout``, a daemon process group that
    gets behind keeps serving stale work long after the
    client has likely given up: workers grind through the
    backlog, latency stays high, and the queue drains
    slowly. With ``queue-timeout`` set, workers discard
    stale work on pickup so they spend their time on
    requests that still matter, and the rest of the system
    gets a 504 signal it can respond to (autoscaling,
    alerts) before the backlog gets unmanageable. Note that
    discard is at pickup, not at the timeout instant, so
    sustained overload can still build a long queue; the
    knob shapes which requests get served when capacity
    returns, not which requests get to wait.

Mixing user-facing SLA expectations with mod_wsgi's fail-safe
    The most common version of this is "we want user
    requests to time out at 30 seconds, so set
    ``request-timeout=30``". The ln-scaling means at
    ``threads=15`` this fires at about 110 seconds, not 30.
    Use application-level timeouts for the SLA and leave
    ``request-timeout`` set to a fail-safe value.

See Also
--------

* :doc:`embedded-and-daemon-mode` for the structural model behind this
  page: process and thread sizing, recycling triggers,
  and process-group patterns.
* :doc:`../configuration-directives/WSGIDaemonProcess` for
  the per-option directive reference.
* :doc:`processes-and-threading` for the Apache MPM and
  Python sub interpreter model that daemon mode builds
  on.
* :doc:`upgrading-an-application` for the SIGUSR1-driven
  cutover pattern that uses ``eviction-timeout`` /
  ``graceful-timeout``.
* :doc:`debugging-techniques` for log-output diagnostics
  when timeouts fire.