The Internal Metrics API
The mod_wsgi built-in module exposes four functions that let a
WSGI application observe its own runtime and ship the resulting
samples to an external metrics service of its choosing. This page
covers their behaviour, the data each one returns, and a worked
example of using them to feed a time-series database without
impacting request-serving performance.
The four functions are:
mod_wsgi.start_recording_metrics()Opt in to per-request metrics accounting.
mod_wsgi.request_metrics()Drain a per-interval snapshot of timing, capacity and resource counters for the current process.
mod_wsgi.process_metrics()Read process-level aggregates and current state.
mod_wsgi.server_metrics()Read the Apache scoreboard view of every process and worker on the server.
request_metrics() and process_metrics() return data only
after start_recording_metrics() has been called, and only when
no external reporter is configured to consume the same data (see
The External Telemetry Service for the external alternative);
server_metrics() has its own configuration gate. Each accessor
returns None rather than raising when it is not active, so
caller code can branch on a single check.
Enabling per-request recording
start_recording_metrics()Enable per-request accounting and seed the per-reader baselines so the first subsequent call to
request_metrics()returns a populated dict covering the interval since this call ran. Idempotent: extra calls have no effect.Per-request accounting has a small but non-zero cost (a locked accumulator update on each request completion). Applications that never consume the data should not call this function; an application that does want the data should call it once at import time before the reporter thread or other consumer starts.
When an external reporter is configured for the process, that reporter is the canonical metrics consumer and the Python accessors below return
Noneregardless of whether this function has been called. An application’s reporter code can detect this with a singleNonecheck onrequest_metrics()and stand down.
request_metrics() : per-interval drain
request_metrics()Return a dict of metrics for the interval since the previous call to this function, then drain the underlying accumulators so the next call covers a fresh interval.
The drain is the important detail: each call empties the per-interval state. That has two consequences for callers:
Only one component in the process should call
request_metrics(). Concurrent callers would each see a partial interval. The expected pattern is a single background reporter thread on a fixed cadence.Do not call
request_metrics()from inside the WSGI application callable. Apart from the drain-clash, the call briefly takes a process-wide lock that worker threads also need; do the work on a thread that does not serve requests.
The first call after start_recording_metrics() returns
samples covering the time since recording was enabled. The
function returns None if start_recording_metrics() has
not been called or if an external reporter is the configured
consumer.
Sample window
pidProcess ID of the calling process. Useful as an additional tag when shipping samples from multiple processes to a shared store.
start_timeWindow start as a floating-point second offset from the Unix epoch. This is the
stop_timeof the previous call (or the time recording was enabled, on the first call).stop_timeWindow end as a floating-point second offset from the Unix epoch.
sample_periodstop_time - start_timein seconds.
Request volume
request_countNumber of requests that completed during the window.
request_throughputrequest_count / sample_period; requests per second.
Capacity
request_threads_maximumThe configured worker-thread ceiling for the process: the
threads=value onWSGIDaemonProcessin daemon mode, or the MPM-derived per-process thread limit in embedded mode.request_threads_startedNumber of worker threads actually instantiated so far. Apache may spin worker threads up lazily; this is the running total.
request_threads_activeNumber of worker threads that either completed a request or were mid-request when the window ended.
capacity_utilizationFraction of worker capacity consumed during the window, computed as total busy time across all worker slots divided by
sample_period * request_threads_maximum. A value near 1.0 means every worker spent the whole window in a request and the process has no spare capacity; a value near 0.0 means the workers were mostly idle.request_threads_completedList of length
request_threads_maximum. Entry i is the number of requests worker slot i + 1 completed during the window. Useful for detecting uneven distribution of work across worker threads. The deprecated aliasrequest_threads_bucketscarries the same value and will be removed in a future release.request_threads_busy_timeList of length
request_threads_maximum, float seconds. Entry i is the total time worker slot i + 1 spent inside a request during the window, including any in-flight tail at drain time.request_threads_cpu_timeList of length
request_threads_maximum, float seconds. Sum of per-request CPU deltas for requests that completed in this slot during the window. Each completing request contributes its full start-to-end CPU delta, regardless of how many earlier windows the request spanned: a long request appears as a CPU spike in the single window in which it completes, not spread across the windows it occupied. This is asymmetric withrequest_threads_busy_time, which folds in the in-flight wall-time tail at each window; the asymmetry is structural, because a worker thread’s CPU usage is only readable from inside that thread, so the snapshot reader cannot sample a peer thread’s in-flight CPU.request_threads_current_elapsedList of length
request_threads_maximum, float seconds. Entry i is the elapsed wall time of any request still in flight in slot i + 1 at the drain instant, or 0.0 if the slot was idle. Useful for spotting stuck requests on a live process.request_threads_max_durationList of length
request_threads_maximum, float seconds. The longest request duration each slot completed during the window.
Phase timing means
Each phase mean is the total time recorded across all completed
requests in the window divided by request_count. Phases
overlap in places (application_time is part of
request_time, for example) so the means do not add up to the
request total.
server_timeAverage time, in seconds, between Apache accepting the request and the WSGI handler returning to Apache.
queue_timeDaemon mode only. Average time the request spent travelling from the Apache worker process to the daemon process, in seconds.
Nonein embedded mode.daemon_timeDaemon mode only. Average time inside the daemon process from accepting the dispatched request to the application callable returning, in seconds.
Nonein embedded mode.application_timeAverage time spent inside the WSGI application callable, in seconds.
request_timeAverage end-to-end time, in seconds, covering acceptance by Apache through to the response being fully written back to the client.
input_read_timeAverage time spent reading the request body, in seconds. Zero for requests with no body.
output_write_timeAverage time spent writing response bytes to the client, in seconds.
Phase timing extremes
For each phase listed above there is a matching pair of integer
microsecond fields giving the smallest and largest observation
recorded during the window. Both keys are None if the phase
saw no requests in the window (and the daemon-only phases are
None in embedded mode regardless).
server_time_min_us / server_time_max_us
queue_time_min_us / queue_time_max_us
daemon_time_min_us / daemon_time_max_us
application_time_min_us / application_time_max_us
request_time_min_us / request_time_max_us
input_read_time_min_us / input_read_time_max_us
output_write_time_min_us / output_write_time_max_us
Phase timing histograms
Each of the following keys carries a list of 65 integer counts representing the distribution of per-request durations across fixed boundaries. The first 64 entries cover 16 octaves from 1 ms up to 65536 ms, split linearly into 4 sub-buckets per octave; the final entry counts samples at or above 65536 ms (~65 s). Values below 1 ms land in entry 0.
server_time_buckets
queue_time_buckets
daemon_time_buckets
application_time_buckets
request_time_buckets
input_read_time_buckets
output_write_time_buckets
gil_wait_time_buckets
GIL contention
gil_wait_timeAverage time per request spent waiting to re-acquire the GIL at the boundaries where mod_wsgi releases it on the application’s behalf: acquiring the interpreter at the start of the request, and re-acquiring the GIL after reading request body bytes, after flushing response headers, and after writing response body bytes. Useful as an indication of contention between mod_wsgi’s worker threads serving concurrent requests in the same process. GIL contention inside the WSGI application itself (for example between Python-level threads the application spawns) is not measured.
gil_wait_time_min_us/gil_wait_time_max_usSmallest and largest single GIL-wait recorded during the window, in microseconds, or
Noneif no waits were recorded.gil_wait_countTotal number of GIL re-acquire events recorded during the window across all requests. Dividing
gil_wait_timeby this count gives mean wait per acquire.
I/O totals
input_bytesTotal request-body bytes read across all completed requests in the window.
input_readsTotal number of read operations against request bodies in the window.
output_bytesTotal response bytes written to clients in the window.
output_writesTotal number of write operations against response sockets in the window.
Response classes
Per-class HTTP response counts for completed requests in the
window. The five counters always sum to request_count;
requests that never called start_response are folded into
status_5xx.
status_1xx, status_2xx, status_3xx, status_4xx,
status_5xx
CPU rates
Each rate is the corresponding CPU delta divided by
sample_period, so a value of 1.0 represents one CPU-second of
work per wall-clock second (one core fully loaded). Values can
exceed 1.0 on multi-core hosts when several worker threads run
CPU-bound work in parallel.
cpu_user_utilizationUser-mode CPU rate for the process.
cpu_system_utilizationKernel-mode CPU rate for the process.
cpu_utilizationcpu_user_utilization + cpu_system_utilization.
The keys cpu_user_time, cpu_system_time and cpu_time
are deprecated aliases for the three keys above, carrying the
same per-window rate values. They are retained for backwards
compatibility but should not be used in new code: their names
collide with identically-named keys in process_metrics()
that carry cumulative CPU-time totals in seconds, very different
quantities with the same labels. New code should use the
_utilization keys.
Memory
memory_rssCurrent resident set size of the process, in bytes.
memory_max_rssPeak resident set size of the process so far, in bytes.
These two values are point-in-time at the moment the snapshot
runs, not interval-derived. The same two keys appear under
process_metrics() below, sourced from the same calls and
carrying identical values; they are duplicated here so a
periodic reporter built around request_metrics() has memory
context attached to every sample without needing a second
function call.
process_metrics() : process aggregates and current state
process_metrics()Return a dict describing the process from start-up to the present moment. Unlike
request_metrics(), this accessor does not drain anything; values are cumulative or point-in-time, not per-interval.
Returns None under the same conditions as
request_metrics(): when start_recording_metrics() has not
been called, or when an external reporter is the configured
consumer.
pidProcess ID.
restart_timeProcess start time as seconds since the Unix epoch.
current_timeWall-clock time at the moment of the call, in seconds since the epoch. Convenient for computing process-uptime client-side without a second clock read.
running_timecurrent_time - restart_time, as an integer second count.request_countTotal number of requests this process has served since start up.
request_busy_timeTotal cumulative time, in seconds, that worker threads spent inside requests. The fraction
request_busy_time / (running_time * request_threads_maximum)gives a process-lifetime equivalent ofcapacity_utilization.request_threadsSame as
request_threads_startedfromrequest_metrics(): number of worker threads instantiated so far.active_requestsNumber of requests currently in flight at the moment of the call.
cpu_user_timeCumulative user-mode CPU time the process has consumed since start-up, in seconds.
cpu_system_timeCumulative kernel-mode CPU time, in seconds.
cpu_timecpu_user_time + cpu_system_time.
The three CPU keys here are lifetime totals in seconds. The
identically-named (deprecated) keys in request_metrics()
carry per-window utilisation rates, not absolute totals. Code
that reads CPU values from both accessors needs to handle the
two unit systems separately; new code reading rates should
prefer the _utilization keys on request_metrics().
memory_rssCurrent resident set size in bytes.
memory_max_rssPeak resident set size in bytes.
Same values as the matching keys in request_metrics();
sourced from the same calls and duplicated across the two
accessors for convenience.
threadsList of per-worker-thread dicts. Each entry has two keys:
thread_idWorker-thread identifier (1-based).
request_countNumber of requests this worker thread has served since start-up.
server_metrics() : Apache scoreboard view
server_metrics()Return a dict reflecting the Apache scoreboard: every process, every worker thread, what each is currently doing, and totals accumulated since the server started.
Unlike the previous two accessors, server_metrics() does not
require start_recording_metrics(). It is gated separately by
configuration: see
WSGIServerMetrics for the
embedded-mode gate, and the server-metrics= option on
WSGIDaemonProcess for the
daemon-mode gate. Returns None when the scoreboard is not
available or the gate is off.
The dict has top-level fields covering the server, followed by a
processes list of process dicts, each of which has a
workers list of worker dicts.
Server level
server_limitConfigured upper bound on number of processes the active MPM may run.
thread_limitConfigured upper bound on number of worker threads per process.
running_generationGeneration counter for the active server. Increments on each graceful restart.
restart_timeTime of the most recent (re)start, in seconds since the Unix epoch.
current_timeWall-clock time at the moment of the call, in seconds since the epoch.
running_timecurrent_time - restart_time, in integer seconds.processesList of process dicts (see below).
Per-process
process_numIndex of this entry in the scoreboard process table.
pidProcess ID of the worker process, or 0 if the slot is unused.
generationGeneration in which this process was spawned.
quiescingTrueif the process is gracefully shutting down (no longer accepting new requests),Falseotherwise.workersList of worker dicts.
Per-worker
thread_numIndex of this worker thread within the process.
generationGeneration in which this worker was created.
statusA single-character string describing the current state of the worker (
_waiting for connection,Rreading request,Wwriting reply,Kkeepalive,Ggracefully finishing, and so on). The same letters Apache uses in itsmod_statusoutput.access_countNumber of requests this worker has served since the process started.
bytes_servedTotal response bytes the worker has written to clients.
start_timeTime the worker last began processing a request, in seconds since the epoch.
stop_timeTime the worker last finished processing a request, in seconds since the epoch.
last_usedTime of the last activity on the worker, in seconds since the epoch.
clientIP address of the client whose request the worker last handled.
requestFirst line of the most recent request handled by the worker, truncated by Apache to a fixed buffer.
vhostServer name of the virtual host the most recent request was served against.
Reporting metrics to an external service
The shape of an in-application reporter is:
Call
start_recording_metrics()so the accessors have data to return.Start a single background thread that wakes on a fixed cadence, calls
request_metrics()(and, if useful,process_metrics()orserver_metrics()), formats the sample for the destination, and writes it.Subscribe to the
process_stoppingevent so the reporter thread can flush a final sample and exit cleanly when the process is shutting down.
The example below feeds request_throughput and
capacity_utilization to an InfluxDB instance every second.
The full set of attributes is documented above; restricting the
example to two of them keeps the moving parts visible.
The application file does nothing except wire up the reporter:
import metrics
metrics.enable_reporting()
def application(environ, start_response):
status = '200 OK'
output = b'Hello World!'
response_headers = [
('Content-type', 'text/plain'),
('Content-Length', str(len(output))),
]
start_response(status, response_headers)
return [output]
The companion metrics module does the work:
import os
import socket
import time
import traceback
import urllib.request
from queue import Queue, Empty
from threading import Thread
import mod_wsgi
HOSTNAME = socket.gethostname()
PID = os.getpid()
PROCESS = f"{HOSTNAME}:{PID}"
INTERVAL = 1.0
INFLUXDB_URL = "http://influxdb.local:8086/write?db=wsgi"
queue = Queue()
def format_line(metrics, timestamp_ns):
"""Build an InfluxDB line-protocol record.
Line protocol is plain ASCII; assembling it is a couple of
f-strings. JSON marshalling and the dict-of-dicts the JSON
clients want is far more expensive at sub-second cadence,
so we format the wire bytes directly.
"""
return (
f"request-metrics,hostname={HOSTNAME},process={PROCESS} "
f"request_throughput={metrics['request_throughput']},"
f"capacity_utilization={metrics['capacity_utilization']} "
f"{timestamp_ns}"
)
def write_to_influxdb(payload):
request = urllib.request.Request(
INFLUXDB_URL, data=payload.encode("ascii"), method="POST"
)
try:
with urllib.request.urlopen(request, timeout=2.0):
pass
except Exception:
traceback.print_exc()
def report_once():
metrics = mod_wsgi.request_metrics()
if metrics is None:
return
timestamp_ns = int(metrics["stop_time"] * 1_000_000_000)
write_to_influxdb(format_line(metrics, timestamp_ns))
def collector():
next_tick = time.time() + INTERVAL
while True:
timeout = max(0.0, next_tick - time.time())
try:
queue.get(timeout=timeout)
except Empty:
report_once()
next_tick += INTERVAL
continue
# Sentinel from the shutdown handler: flush and exit.
report_once()
return
# daemon=False so process_stopping can join us cleanly, and so
# the module is also usable inside a per-interpreter-GIL sub
# interpreter (where daemon threads are not permitted).
thread = Thread(target=collector, daemon=False)
def shutdown_handler(name, **kwargs):
queue.put(None)
_started = False
def enable_reporting():
# Guard against double activation: in embedded mode a
# modified wsgi.py is reloaded in the same process, which
# re-runs the wsgi.py top-level import-and-call. The
# second call would otherwise hit Thread.start() on the
# already-running thread and raise RuntimeError.
global _started
if _started:
return
_started = True
mod_wsgi.start_recording_metrics()
mod_wsgi.subscribe_shutdown(shutdown_handler)
thread.start()
The shape is almost identical to a plain “background reporter
thread” pattern in any other application. The mod_wsgi-specific
parts are the three function calls inside enable_reporting():
start_recording_metrics() so the accessor returns data,
request_metrics() inside the loop to read it, and
subscribe_shutdown() so the thread is signalled at process
shutdown rather than being killed mid-write.
Hosting the reporter in a dedicated sub-interpreter
The worked example above runs the reporter from inside the same
sub-interpreter that hosts the WSGI application: wsgi.py
imports the metrics module and calls enable_reporting()
at import time. That is the simplest deployment but couples the
two concerns: the reporter module is visible to application
code, and the one-consumer-per-process rule rests on the
application never accidentally calling request_metrics()
itself.
A cleaner option in daemon mode is to put the reporter in its own sub-interpreter, separate from the WSGI application’s sub-interpreter but in the same daemon process. Per-process metrics state is process-wide and shared across every sub-interpreter in the process, so a reporter running in one sub-interpreter sees the data produced by requests served in another. Isolation makes the one-consumer rule structural rather than a discipline the application has to maintain.
The mechanism is
WSGIImportScript, configured
to import a small launcher into the same process-group= as
the WSGI application but a distinct application-group=. Add
a reporter.py next to metrics.py whose only job is to
import the metrics module and trigger it:
import metrics
metrics.enable_reporting()
Then point WSGIImportScript at reporter.py:
WSGIDaemonProcess myapp threads=15 \
python-path=/var/www/myapp
WSGIScriptAlias / /var/www/myapp/wsgi.py \
process-group=myapp \
application-group=%{GLOBAL}
WSGIImportScript /var/www/myapp/reporter.py \
process-group=myapp \
application-group=metrics
The python-path= option puts /var/www/myapp on
sys.path for the daemon process so that reporter.py’s
import metrics can find the module next to it. Without it
WSGIImportScript would run reporter.py as a top-level
file but import metrics would not resolve. WSGIImportScript
runs reporter.py at daemon startup, the import metrics
line pulls the module in, and enable_reporting() does its
three calls before the first request arrives. wsgi.py no longer imports the metrics
module and no longer references enable_reporting(); the
WSGI application file becomes whatever it would have been
without metrics reporting at all. metrics.py is unchanged
from the worked example above.
Keeping the launcher in its own file rather than activating
from the bottom of metrics.py itself preserves the
no-import-side-effect property of the metrics module, which
matters if anything else (a test harness, a one-off script,
WSGIImportScript loaded in a different application group
for some reason) ever imports it.
Two consequences worth flagging:
Each sub-interpreter has its own
mod_wsgimodule object and its own set of event subscribers. Thesubscribe_shutdowncallback registered in the metrics sub-interpreter only fires for events published into that sub-interpreter.process_stoppingis published to every sub-interpreter in the daemon process, so the reporter is notified at the right moment to drain a final sample and stop its thread.The reporter script and the WSGI application should not import each other. Sub-interpreters do not share Python module state, and crossing the boundary either duplicates state or, with C extensions that are not sub-interpreter safe, fails outright. The split is precisely the point of this deployment.
Keeping reporting off the hot path
The point of the design above is that worker threads serving requests pay almost nothing for the reporter. Things that matter:
Drain on a dedicated thread, never from inside the WSGI application callable.
request_metrics()is a per-interval drain: every call empties the accumulators. A worker thread that called it during a request would consume the data the reporter thread expected to ship, and concurrent callers would each end up with partial windows. The one-consumer pattern is what keeps each shipped sample a coherent snapshot.Pick an aggregation interval long enough that the per-tick cost (one
request_metrics()call, one wire write) is negligible against the per-request work the process is doing. One second is a reasonable default; sub-second cadences are possible but rarely useful.Pre-encode in a compact wire format and write it as bytes. Line protocol, OpenMetrics text exposition, StatsD packets and similar formats are cheap to assemble from primitive values. JSON marshalling, especially via a third-party metrics-store client, is much more expensive per sample and unnecessary when the wire format is straightforward.
Buffer the write locally and use a short timeout. If the destination is unreachable, only the reporter thread blocks; worker threads continue to serve requests, and the next tick gets a chance to recover.
Use
mod_wsgi.subscribe_shutdownto signal the reporter thread, notatexit. Theprocess_stoppingevent fires before Python’s interpreter finalisation begins, while there is still time to put the sentinel on the queue.atexitcallbacks run as part of finalisation, after the runtime has already joined every non-daemon thread; a non-daemon reporter thread waiting on a queue would never be signalled and the process would hang.Create the reporter thread with
daemon=Falseand rely on the shutdown handler to stop it. Non-daemon threads also let the same code run unchanged inside a sub-interpreter that owns its own GIL, where daemon threads are not permitted.One reporter per process, one set of accumulators per process. If the process hosts multiple sub-interpreters, only one of them should call
start_recording_metrics()and run a reporter, because the accumulators are shared and a second caller would drain a partial window from the first.
See also
Subscribing to Events: full reference for the
subscribe_events/subscribe_shutdownAPI the example above uses to signal the reporter thread at process shutdown.Registering Cleanup Code: broader patterns for end-of-request and end-of-process cleanup.
The mod_wsgi Python Module: short reference summary of the
mod_wsgibuilt-in module, including the metrics accessors.WSGIServerMetrics: enables the Apache scoreboard so
server_metrics()returns data in embedded mode.WSGIDaemonProcess: the
server-metrics=option enables the scoreboard for a daemon process group.The External Telemetry Service: the external-push counterpart to this in-process pull API, with its own browser UI and terminal monitor.
WSGITelemetryService: Apache directive that enables the external telemetry reporter for the whole instance; presence of this directive is what causes the accessors documented above to return
None.