==========================
The Internal Metrics API
==========================

The ``mod_wsgi`` built-in module exposes four functions that let a
WSGI application observe its own runtime and ship the resulting
samples to an external metrics service of its choosing. This page
covers their behaviour, the data each one returns, and a worked
example of using them to feed a time-series database without
impacting request-serving performance.

The four functions are:

``mod_wsgi.start_recording_metrics()``
   Opt in to per-request metrics accounting.

``mod_wsgi.request_metrics()``
   Drain a per-interval snapshot of timing, capacity and resource
   counters for the current process.

``mod_wsgi.process_metrics()``
   Read process-level aggregates and current state.

``mod_wsgi.server_metrics()``
   Read the Apache scoreboard view of every process and worker on
   the server.

``request_metrics()`` and ``process_metrics()`` return data only
after ``start_recording_metrics()`` has been called, and only when
no external reporter is configured to consume the same data (see
:doc:`external-telemetry-service` for the external alternative);
``server_metrics()`` has its own configuration gate. Each accessor
returns ``None`` rather than raising when it is not active, so
caller code can branch on a single check.

Enabling per-request recording
------------------------------

``start_recording_metrics()``
   Enable per-request accounting and seed the per-reader baselines
   so the first subsequent call to ``request_metrics()`` returns a
   populated dict covering the interval since this call ran.
   Idempotent: extra calls have no effect.

   Per-request accounting has a small but non-zero cost (a locked
   accumulator update on each request completion). Applications
   that never consume the data should not call this function; an
   application that does want the data should call it once at
   import time before the reporter thread or other consumer
   starts.

   When an external reporter is configured for the process, that
   reporter is the canonical metrics consumer and the Python
   accessors below return ``None`` regardless of whether this
   function has been called. An application's reporter code can
   detect this with a single ``None`` check on ``request_metrics()``
   and stand down.

request_metrics() : per-interval drain
--------------------------------------

``request_metrics()``
   Return a dict of metrics for the interval since the previous
   call to this function, then drain the underlying accumulators
   so the next call covers a fresh interval.

The drain is the important detail: each call empties the
per-interval state. That has two consequences for callers:

* Only one component in the process should call
  ``request_metrics()``. Concurrent callers would each see a
  partial interval. The expected pattern is a single background
  reporter thread on a fixed cadence.
* Do not call ``request_metrics()`` from inside the WSGI
  application callable. Apart from the drain-clash, the call
  briefly takes a process-wide lock that worker threads also
  need; do the work on a thread that does not serve requests.

The first call after ``start_recording_metrics()`` returns
samples covering the time since recording was enabled. The
function returns ``None`` if ``start_recording_metrics()`` has
not been called or if an external reporter is the configured
consumer.

Sample window
~~~~~~~~~~~~~

``pid``
   Process ID of the calling process. Useful as an additional tag
   when shipping samples from multiple processes to a shared store.

``start_time``
   Window start as a floating-point second offset from the Unix
   epoch. This is the ``stop_time`` of the previous call (or the
   time recording was enabled, on the first call).

``stop_time``
   Window end as a floating-point second offset from the Unix
   epoch.

``sample_period``
   ``stop_time - start_time`` in seconds.

Request volume
~~~~~~~~~~~~~~

``request_count``
   Number of requests that completed during the window.

``request_throughput``
   ``request_count / sample_period``; requests per second.

Capacity
~~~~~~~~

``request_threads_maximum``
   The configured worker-thread ceiling for the process: the
   ``threads=`` value on ``WSGIDaemonProcess`` in daemon mode, or
   the MPM-derived per-process thread limit in embedded mode.

``request_threads_started``
   Number of worker threads actually instantiated so far. Apache
   may spin worker threads up lazily; this is the running total.

``request_threads_active``
   Number of worker threads that either completed a request or
   were mid-request when the window ended.

``capacity_utilization``
   Fraction of worker capacity consumed during the window,
   computed as total busy time across all worker slots divided by
   ``sample_period * request_threads_maximum``. A value near 1.0
   means every worker spent the whole window in a request and the
   process has no spare capacity; a value near 0.0 means the
   workers were mostly idle.

``request_threads_completed``
   List of length ``request_threads_maximum``. Entry *i* is the
   number of requests worker slot *i + 1* completed during the
   window. Useful for detecting uneven distribution of work
   across worker threads. The deprecated alias
   ``request_threads_buckets`` carries the same value and will be
   removed in a future release.

``request_threads_busy_time``
   List of length ``request_threads_maximum``, float seconds.
   Entry *i* is the total time worker slot *i + 1* spent inside
   a request during the window, including any in-flight tail at
   drain time.

``request_threads_cpu_time``
   List of length ``request_threads_maximum``, float seconds.
   Sum of per-request CPU deltas for requests that completed
   in this slot during the window. Each completing request
   contributes its full start-to-end CPU delta, regardless of
   how many earlier windows the request spanned: a long
   request appears as a CPU spike in the single window in
   which it completes, not spread across the windows it
   occupied. This is asymmetric with ``request_threads_busy_time``,
   which folds in the in-flight wall-time tail at each
   window; the asymmetry is structural, because a worker
   thread's CPU usage is only readable from inside that
   thread, so the snapshot reader cannot sample a peer
   thread's in-flight CPU.

``request_threads_current_elapsed``
   List of length ``request_threads_maximum``, float seconds.
   Entry *i* is the elapsed wall time of any request still in
   flight in slot *i + 1* at the drain instant, or 0.0 if the
   slot was idle. Useful for spotting stuck requests on a live
   process.

``request_threads_max_duration``
   List of length ``request_threads_maximum``, float seconds.
   The longest request duration each slot completed during the
   window.

Phase timing means
~~~~~~~~~~~~~~~~~~

Each phase mean is the total time recorded across all completed
requests in the window divided by ``request_count``. Phases
overlap in places (``application_time`` is part of
``request_time``, for example) so the means do not add up to the
request total.

``server_time``
   Average time, in seconds, between Apache accepting the request
   and the WSGI handler returning to Apache.

``queue_time``
   Daemon mode only. Average time the request spent travelling
   from the Apache worker process to the daemon process, in
   seconds. ``None`` in embedded mode.

``daemon_time``
   Daemon mode only. Average time inside the daemon process from
   accepting the dispatched request to the application callable
   returning, in seconds. ``None`` in embedded mode.

``application_time``
   Average time spent inside the WSGI application callable, in
   seconds.

``request_time``
   Average end-to-end time, in seconds, covering acceptance by
   Apache through to the response being fully written back to the
   client.

``input_read_time``
   Average time spent reading the request body, in seconds. Zero
   for requests with no body.

``output_write_time``
   Average time spent writing response bytes to the client, in
   seconds.

Phase timing extremes
~~~~~~~~~~~~~~~~~~~~~

For each phase listed above there is a matching pair of integer
microsecond fields giving the smallest and largest observation
recorded during the window. Both keys are ``None`` if the phase
saw no requests in the window (and the daemon-only phases are
``None`` in embedded mode regardless).

``server_time_min_us`` / ``server_time_max_us``

``queue_time_min_us`` / ``queue_time_max_us``

``daemon_time_min_us`` / ``daemon_time_max_us``

``application_time_min_us`` / ``application_time_max_us``

``request_time_min_us`` / ``request_time_max_us``

``input_read_time_min_us`` / ``input_read_time_max_us``

``output_write_time_min_us`` / ``output_write_time_max_us``

Phase timing histograms
~~~~~~~~~~~~~~~~~~~~~~~

Each of the following keys carries a list of 65 integer counts
representing the distribution of per-request durations across
fixed boundaries. The first 64 entries cover 16 octaves from 1 ms
up to 65536 ms, split linearly into 4 sub-buckets per octave; the
final entry counts samples at or above 65536 ms (~65 s). Values
below 1 ms land in entry 0.

``server_time_buckets``

``queue_time_buckets``

``daemon_time_buckets``

``application_time_buckets``

``request_time_buckets``

``input_read_time_buckets``

``output_write_time_buckets``

``gil_wait_time_buckets``

GIL contention
~~~~~~~~~~~~~~

``gil_wait_time``
   Average time per request spent waiting to re-acquire the GIL
   at the boundaries where mod_wsgi releases it on the
   application's behalf: acquiring the interpreter at the start
   of the request, and re-acquiring the GIL after reading
   request body bytes, after flushing response headers, and
   after writing response body bytes. Useful as an indication
   of contention between mod_wsgi's worker threads serving
   concurrent requests in the same process. GIL contention
   inside the WSGI application itself (for example between
   Python-level threads the application spawns) is not measured.

``gil_wait_time_min_us`` / ``gil_wait_time_max_us``
   Smallest and largest single GIL-wait recorded during the
   window, in microseconds, or ``None`` if no waits were
   recorded.

``gil_wait_count``
   Total number of GIL re-acquire events recorded during the
   window across all requests. Dividing ``gil_wait_time`` by this
   count gives mean wait per acquire.

I/O totals
~~~~~~~~~~

``input_bytes``
   Total request-body bytes read across all completed requests in
   the window.

``input_reads``
   Total number of read operations against request bodies in the
   window.

``output_bytes``
   Total response bytes written to clients in the window.

``output_writes``
   Total number of write operations against response sockets in
   the window.

Response classes
~~~~~~~~~~~~~~~~

Per-class HTTP response counts for completed requests in the
window. The five counters always sum to ``request_count``;
requests that never called ``start_response`` are folded into
``status_5xx``.

``status_1xx``, ``status_2xx``, ``status_3xx``, ``status_4xx``,
``status_5xx``

CPU rates
~~~~~~~~~

Each rate is the corresponding CPU delta divided by
``sample_period``, so a value of 1.0 represents one CPU-second of
work per wall-clock second (one core fully loaded). Values can
exceed 1.0 on multi-core hosts when several worker threads run
CPU-bound work in parallel.

``cpu_user_utilization``
   User-mode CPU rate for the process.

``cpu_system_utilization``
   Kernel-mode CPU rate for the process.

``cpu_utilization``
   ``cpu_user_utilization + cpu_system_utilization``.

The keys ``cpu_user_time``, ``cpu_system_time`` and ``cpu_time``
are deprecated aliases for the three keys above, carrying the
same per-window rate values. They are retained for backwards
compatibility but should not be used in new code: their names
collide with identically-named keys in ``process_metrics()``
that carry cumulative CPU-time totals in seconds, very different
quantities with the same labels. New code should use the
``_utilization`` keys.

Memory
~~~~~~

``memory_rss``
   Current resident set size of the process, in bytes.

``memory_max_rss``
   Peak resident set size of the process so far, in bytes.

These two values are point-in-time at the moment the snapshot
runs, not interval-derived. The same two keys appear under
``process_metrics()`` below, sourced from the same calls and
carrying identical values; they are duplicated here so a
periodic reporter built around ``request_metrics()`` has memory
context attached to every sample without needing a second
function call.

process_metrics() : process aggregates and current state
--------------------------------------------------------

``process_metrics()``
   Return a dict describing the process from start-up to the
   present moment. Unlike ``request_metrics()``, this accessor
   does not drain anything; values are cumulative or
   point-in-time, not per-interval.

Returns ``None`` under the same conditions as
``request_metrics()``: when ``start_recording_metrics()`` has not
been called, or when an external reporter is the configured
consumer.

``pid``
   Process ID.

``restart_time``
   Process start time as seconds since the Unix epoch.

``current_time``
   Wall-clock time at the moment of the call, in seconds since the
   epoch. Convenient for computing process-uptime client-side
   without a second clock read.

``running_time``
   ``current_time - restart_time``, as an integer second count.

``request_count``
   Total number of requests this process has served since start
   up.

``request_busy_time``
   Total cumulative time, in seconds, that worker threads spent
   inside requests. The fraction
   ``request_busy_time / (running_time * request_threads_maximum)``
   gives a process-lifetime equivalent of
   ``capacity_utilization``.

``request_threads``
   Same as ``request_threads_started`` from ``request_metrics()``:
   number of worker threads instantiated so far.

``active_requests``
   Number of requests currently in flight at the moment of the
   call.

``cpu_user_time``
   Cumulative user-mode CPU time the process has consumed since
   start-up, in seconds.

``cpu_system_time``
   Cumulative kernel-mode CPU time, in seconds.

``cpu_time``
   ``cpu_user_time + cpu_system_time``.

The three CPU keys here are lifetime totals in seconds. The
identically-named (deprecated) keys in ``request_metrics()``
carry per-window utilisation rates, not absolute totals. Code
that reads CPU values from both accessors needs to handle the
two unit systems separately; new code reading rates should
prefer the ``_utilization`` keys on ``request_metrics()``.

``memory_rss``
   Current resident set size in bytes.

``memory_max_rss``
   Peak resident set size in bytes.

Same values as the matching keys in ``request_metrics()``;
sourced from the same calls and duplicated across the two
accessors for convenience.

``threads``
   List of per-worker-thread dicts. Each entry has two keys:

   ``thread_id``
      Worker-thread identifier (1-based).

   ``request_count``
      Number of requests this worker thread has served since
      start-up.

server_metrics() : Apache scoreboard view
-----------------------------------------

``server_metrics()``
   Return a dict reflecting the Apache scoreboard: every process,
   every worker thread, what each is currently doing, and totals
   accumulated since the server started.

Unlike the previous two accessors, ``server_metrics()`` does not
require ``start_recording_metrics()``. It is gated separately by
configuration: see
:doc:`../configuration-directives/WSGIServerMetrics` for the
embedded-mode gate, and the ``server-metrics=`` option on
:doc:`../configuration-directives/WSGIDaemonProcess` for the
daemon-mode gate. Returns ``None`` when the scoreboard is not
available or the gate is off.

The dict has top-level fields covering the server, followed by a
``processes`` list of process dicts, each of which has a
``workers`` list of worker dicts.

Server level
~~~~~~~~~~~~

``server_limit``
   Configured upper bound on number of processes the active MPM
   may run.

``thread_limit``
   Configured upper bound on number of worker threads per process.

``running_generation``
   Generation counter for the active server. Increments on each
   graceful restart.

``restart_time``
   Time of the most recent (re)start, in seconds since the Unix
   epoch.

``current_time``
   Wall-clock time at the moment of the call, in seconds since the
   epoch.

``running_time``
   ``current_time - restart_time``, in integer seconds.

``processes``
   List of process dicts (see below).

Per-process
~~~~~~~~~~~

``process_num``
   Index of this entry in the scoreboard process table.

``pid``
   Process ID of the worker process, or 0 if the slot is unused.

``generation``
   Generation in which this process was spawned.

``quiescing``
   ``True`` if the process is gracefully shutting down (no longer
   accepting new requests), ``False`` otherwise.

``workers``
   List of worker dicts.

Per-worker
~~~~~~~~~~

``thread_num``
   Index of this worker thread within the process.

``generation``
   Generation in which this worker was created.

``status``
   A single-character string describing the current state of the
   worker (``_`` waiting for connection, ``R`` reading request,
   ``W`` writing reply, ``K`` keepalive, ``G`` gracefully
   finishing, and so on). The same letters Apache uses in its
   ``mod_status`` output.

``access_count``
   Number of requests this worker has served since the process
   started.

``bytes_served``
   Total response bytes the worker has written to clients.

``start_time``
   Time the worker last began processing a request, in seconds
   since the epoch.

``stop_time``
   Time the worker last finished processing a request, in seconds
   since the epoch.

``last_used``
   Time of the last activity on the worker, in seconds since the
   epoch.

``client``
   IP address of the client whose request the worker last handled.

``request``
   First line of the most recent request handled by the worker,
   truncated by Apache to a fixed buffer.

``vhost``
   Server name of the virtual host the most recent request was
   served against.

Reporting metrics to an external service
----------------------------------------

The shape of an in-application reporter is:

1. Call ``start_recording_metrics()`` so the accessors have data
   to return.
2. Start a single background thread that wakes on a fixed
   cadence, calls ``request_metrics()`` (and, if useful,
   ``process_metrics()`` or ``server_metrics()``), formats the
   sample for the destination, and writes it.
3. Subscribe to the ``process_stopping`` event so the reporter
   thread can flush a final sample and exit cleanly when the
   process is shutting down.

The example below feeds ``request_throughput`` and
``capacity_utilization`` to an InfluxDB instance every second.
The full set of attributes is documented above; restricting the
example to two of them keeps the moving parts visible.

The application file does nothing except wire up the reporter:

.. code-block:: python

    import metrics

    metrics.enable_reporting()

    def application(environ, start_response):
        status = '200 OK'
        output = b'Hello World!'

        response_headers = [
            ('Content-type', 'text/plain'),
            ('Content-Length', str(len(output))),
        ]
        start_response(status, response_headers)

        return [output]

The companion ``metrics`` module does the work:

.. code-block:: python

    import os
    import socket
    import time
    import traceback
    import urllib.request

    from queue import Queue, Empty
    from threading import Thread

    import mod_wsgi

    HOSTNAME = socket.gethostname()
    PID = os.getpid()
    PROCESS = f"{HOSTNAME}:{PID}"

    INTERVAL = 1.0
    INFLUXDB_URL = "http://influxdb.local:8086/write?db=wsgi"

    queue = Queue()

    def format_line(metrics, timestamp_ns):
        """Build an InfluxDB line-protocol record.

        Line protocol is plain ASCII; assembling it is a couple of
        f-strings. JSON marshalling and the dict-of-dicts the JSON
        clients want is far more expensive at sub-second cadence,
        so we format the wire bytes directly.
        """
        return (
            f"request-metrics,hostname={HOSTNAME},process={PROCESS} "
            f"request_throughput={metrics['request_throughput']},"
            f"capacity_utilization={metrics['capacity_utilization']} "
            f"{timestamp_ns}"
        )

    def write_to_influxdb(payload):
        request = urllib.request.Request(
            INFLUXDB_URL, data=payload.encode("ascii"), method="POST"
        )
        try:
            with urllib.request.urlopen(request, timeout=2.0):
                pass
        except Exception:
            traceback.print_exc()

    def report_once():
        metrics = mod_wsgi.request_metrics()
        if metrics is None:
            return
        timestamp_ns = int(metrics["stop_time"] * 1_000_000_000)
        write_to_influxdb(format_line(metrics, timestamp_ns))

    def collector():
        next_tick = time.time() + INTERVAL
        while True:
            timeout = max(0.0, next_tick - time.time())
            try:
                queue.get(timeout=timeout)
            except Empty:
                report_once()
                next_tick += INTERVAL
                continue
            # Sentinel from the shutdown handler: flush and exit.
            report_once()
            return

    # daemon=False so process_stopping can join us cleanly, and so
    # the module is also usable inside a per-interpreter-GIL sub
    # interpreter (where daemon threads are not permitted).
    thread = Thread(target=collector, daemon=False)

    def shutdown_handler(name, **kwargs):
        queue.put(None)

    _started = False

    def enable_reporting():
        # Guard against double activation: in embedded mode a
        # modified wsgi.py is reloaded in the same process, which
        # re-runs the wsgi.py top-level import-and-call. The
        # second call would otherwise hit Thread.start() on the
        # already-running thread and raise RuntimeError.
        global _started
        if _started:
            return
        _started = True

        mod_wsgi.start_recording_metrics()
        mod_wsgi.subscribe_shutdown(shutdown_handler)
        thread.start()

The shape is almost identical to a plain "background reporter
thread" pattern in any other application. The mod_wsgi-specific
parts are the three function calls inside ``enable_reporting()``:
``start_recording_metrics()`` so the accessor returns data,
``request_metrics()`` inside the loop to read it, and
``subscribe_shutdown()`` so the thread is signalled at process
shutdown rather than being killed mid-write.

Hosting the reporter in a dedicated sub-interpreter
---------------------------------------------------

The worked example above runs the reporter from inside the same
sub-interpreter that hosts the WSGI application: ``wsgi.py``
imports the ``metrics`` module and calls ``enable_reporting()``
at import time. That is the simplest deployment but couples the
two concerns: the reporter module is visible to application
code, and the one-consumer-per-process rule rests on the
application never accidentally calling ``request_metrics()``
itself.

A cleaner option in daemon mode is to put the reporter in its
own sub-interpreter, separate from the WSGI application's
sub-interpreter but in the same daemon process. Per-process
metrics state is process-wide and shared across every
sub-interpreter in the process, so a reporter running in one
sub-interpreter sees the data produced by requests served in
another. Isolation makes the one-consumer rule structural
rather than a discipline the application has to maintain.

The mechanism is
:doc:`../configuration-directives/WSGIImportScript`, configured
to import a small launcher into the same ``process-group=`` as
the WSGI application but a distinct ``application-group=``. Add
a ``reporter.py`` next to ``metrics.py`` whose only job is to
import the metrics module and trigger it::

    import metrics

    metrics.enable_reporting()

Then point ``WSGIImportScript`` at ``reporter.py``::

    WSGIDaemonProcess myapp threads=15 \
        python-path=/var/www/myapp
    WSGIScriptAlias / /var/www/myapp/wsgi.py \
        process-group=myapp \
        application-group=%{GLOBAL}
    WSGIImportScript /var/www/myapp/reporter.py \
        process-group=myapp \
        application-group=metrics

The ``python-path=`` option puts ``/var/www/myapp`` on
``sys.path`` for the daemon process so that ``reporter.py``'s
``import metrics`` can find the module next to it. Without it
``WSGIImportScript`` would run ``reporter.py`` as a top-level
file but ``import metrics`` would not resolve. ``WSGIImportScript``
runs ``reporter.py`` at daemon startup, the ``import metrics``
line pulls the module in, and ``enable_reporting()`` does its
three calls before the first request arrives. ``wsgi.py`` no longer imports the ``metrics``
module and no longer references ``enable_reporting()``; the
WSGI application file becomes whatever it would have been
without metrics reporting at all. ``metrics.py`` is unchanged
from the worked example above.

Keeping the launcher in its own file rather than activating
from the bottom of ``metrics.py`` itself preserves the
no-import-side-effect property of the metrics module, which
matters if anything else (a test harness, a one-off script,
``WSGIImportScript`` loaded in a different application group
for some reason) ever imports it.

Two consequences worth flagging:

* Each sub-interpreter has its own ``mod_wsgi`` module object
  and its own set of event subscribers. The
  ``subscribe_shutdown`` callback registered in the metrics
  sub-interpreter only fires for events published into that
  sub-interpreter. ``process_stopping`` is published to every
  sub-interpreter in the daemon process, so the reporter is
  notified at the right moment to drain a final sample and
  stop its thread.
* The reporter script and the WSGI application should not
  import each other. Sub-interpreters do not share Python
  module state, and crossing the boundary either duplicates
  state or, with C extensions that are not sub-interpreter
  safe, fails outright. The split is precisely the point of
  this deployment.

Keeping reporting off the hot path
----------------------------------

The point of the design above is that worker threads serving
requests pay almost nothing for the reporter. Things that matter:

* Drain on a dedicated thread, never from inside the WSGI
  application callable. ``request_metrics()`` is a per-interval
  drain: every call empties the accumulators. A worker thread
  that called it during a request would consume the data the
  reporter thread expected to ship, and concurrent callers
  would each end up with partial windows. The one-consumer
  pattern is what keeps each shipped sample a coherent snapshot.
* Pick an aggregation interval long enough that the per-tick
  cost (one ``request_metrics()`` call, one wire write) is
  negligible against the per-request work the process is doing.
  One second is a reasonable default; sub-second cadences are
  possible but rarely useful.
* Pre-encode in a compact wire format and write it as bytes.
  Line protocol, OpenMetrics text exposition, StatsD packets and
  similar formats are cheap to assemble from primitive values.
  JSON marshalling, especially via a third-party metrics-store
  client, is much more expensive per sample and unnecessary when
  the wire format is straightforward.
* Buffer the write locally and use a short timeout. If the
  destination is unreachable, only the reporter thread blocks;
  worker threads continue to serve requests, and the next tick
  gets a chance to recover.
* Use ``mod_wsgi.subscribe_shutdown`` to signal the reporter
  thread, not ``atexit``. The ``process_stopping`` event fires
  before Python's interpreter finalisation begins, while there
  is still time to put the sentinel on the queue. ``atexit``
  callbacks run as part of finalisation, *after* the runtime
  has already joined every non-daemon thread; a non-daemon
  reporter thread waiting on a queue would never be signalled
  and the process would hang.
* Create the reporter thread with ``daemon=False`` and rely on
  the shutdown handler to stop it. Non-daemon threads also let
  the same code run unchanged inside a sub-interpreter that
  owns its own GIL, where daemon threads are not permitted.
* One reporter per process, one set of accumulators per process.
  If the process hosts multiple sub-interpreters, only one of
  them should call ``start_recording_metrics()`` and run a
  reporter, because the accumulators are shared and a second
  caller would drain a partial window from the first.

See also
--------

* :doc:`subscribing-to-events`: full reference for the
  ``subscribe_events`` / ``subscribe_shutdown`` API the example
  above uses to signal the reporter thread at process shutdown.
* :doc:`registering-cleanup-code`: broader patterns for
  end-of-request and end-of-process cleanup.
* :doc:`mod-wsgi-python-module`: short reference summary of the
  ``mod_wsgi`` built-in module, including the metrics accessors.
* :doc:`../configuration-directives/WSGIServerMetrics`: enables
  the Apache scoreboard so ``server_metrics()`` returns data in
  embedded mode.
* :doc:`../configuration-directives/WSGIDaemonProcess`: the
  ``server-metrics=`` option enables the scoreboard for a daemon
  process group.
* :doc:`external-telemetry-service`: the external-push counterpart
  to this in-process pull API, with its own browser UI and
  terminal monitor.
* :doc:`../configuration-directives/WSGITelemetryService`:
  Apache directive that enables the external telemetry reporter
  for the whole instance; presence of this directive is what
  causes the accessors documented above to return ``None``.