============================== The External Telemetry Service ============================== mod_wsgi can stream live telemetry to an external ingester running alongside the Apache instance. Each mod_wsgi process emits a binary datagram once per sampling interval, and a separate ingester process aggregates the stream and serves a browser UI showing per-process throughput, latency percentiles, capacity, CPU and memory, HTTP response-class breakdowns, and a slow-request feed. The reporter runs in both daemon mode (one thread per daemon process) and embedded mode (one thread per Apache MPM child). This is the *external push* counterpart to the *internal pull* :doc:`internal-metrics-api`. The two surfaces are independent and present similar data: * The internal API exposes accessors callable from inside the WSGI application, so the application owns its data and the choice of destination. * The external service is configured in the Apache config; the data flows out over a UNIX socket without involving the application at all, and the bundled ingester gives an immediate live UI. Only one of the two will return data at a time in a given process: when the external reporter is configured, the in-process accessors return ``None`` so the application can detect that the external pipeline owns the stream. The ingester is distributed separately on PyPi as the ``mod_wsgi-telemetry`` package. It is intentionally not part of the ``mod_wsgi`` package or ``mod_wsgi-express``, so an installation using the operating-system ``mod_wsgi`` package (or any other manually-configured Apache) can use the telemetry pipeline without adopting ``mod_wsgi-express`` as well. .. note:: The telemetry reporter is not available on Windows. It delivers its datagrams over a UNIX ``SOCK_DGRAM`` socket, which Windows does not provide, so the feature is not built there and the ``WSGITelemetryService``, ``WSGISlowRequests`` and ``WSGITelemetryOptions`` directives are not registered. How it works ------------ Each process hosting a WSGI application runs a single dedicated reporter thread. In daemon mode that is the daemon process; in embedded mode it is the Apache MPM child. On a fixed interval the thread snapshots the process's per-interval counters and slow-request records, encodes them as a binary type-length-value datagram, and sends the datagram over a UNIX SOCK_DGRAM socket to the ingester. The reporter does not block the request-serving threads; it reads from accumulators that the request path updates under a brief, contended-rarely lock. Datagrams are sent unreliably (SOCK_DGRAM has no retransmit), but co-locating the ingester on the same host makes practical loss negligible. There is no fallback to TCP or remote UDP: the transport is local-host only, so MTU sizing, packet fragmentation across the network, and inter-host packet loss are not part of the operating model. To ship telemetry across hosts, run a local ingester on each host and forward its data from there using a tool of your choice. The ingester opens the same UNIX socket in listening mode, decodes incoming datagrams, maintains a rolling per-process window in memory, and exposes the result over an HTTP + WebSocket interface that the browser UI and the terminal monitor both consume. What you observe ---------------- Per process, per sampling interval: * Request throughput (requests per second) and counts split by HTTP response class (``1xx`` / ``2xx`` / ``3xx`` / ``4xx`` / ``5xx``). * Latency distribution for each phase of the request pipeline (server-side wait, queue, daemon dispatch, application, full request) as an HDR-style histogram, with ``p50`` / ``p95`` / ``p99`` and exact min/max. * Capacity: how many of the worker slots are currently busy and how long any in-flight request has been running. * Resource use: CPU time (user + system) and resident set size. * Slow-request records: per-request snapshots for requests that exceeded the configured threshold, including elapsed time, request method, URL path (query string stripped), HTTP status, and (optionally) the ``User-Agent`` string. The data is aggregated across every process configured to report, and grouped in the UI by process group so a server hosting multiple WSGI applications can be viewed as a whole or one group at a time. Enabling the reporter in a manually-configured Apache ----------------------------------------------------- Three directives drive the reporter. All three are server-wide directives: they must be declared at the top level of the Apache configuration, outside any ```` block. Apache rejects the config at startup if they appear in a per-vhost or per-directory context. One configuration covers the whole Apache instance: every embedded-mode Apache MPM child and every daemon-mode worker process defined on the server starts a reporter from the same ``WSGITelemetryService`` line. ``WSGITelemetryService TARGET [interval=SECONDS]`` Enable the reporter and point it at the ingester. ``TARGET`` is the UNIX socket path in the form ``unix:/path/to/socket``. The optional ``interval=`` parameter sets the sampling interval in seconds (default ``1.0``, minimum ``0.5``). The reporter starts in every mod_wsgi process (each daemon-mode worker and each embedded-mode Apache MPM child) when the directive is set; without it the reporter thread is not created. See :doc:`../configuration-directives/WSGITelemetryService` for the full directive reference. ``WSGITelemetryOptions [+|-]Flag [+|-]Flag ... | None | All`` Capture toggles for fields that are off by default for privacy or volume reasons. The currently-defined flag is ``CaptureUserAgent``, which adds the request's ``User-Agent`` string to slow-request records. The ``+Flag`` / ``-Flag`` incremental form composes across multiple lines; absolute ``None`` and ``All`` set the state directly. See :doc:`../configuration-directives/WSGITelemetryOptions` for the full directive reference. ``WSGISlowRequests SECONDS`` Enable slow-request reporting and set the threshold above which a still-running request is included in the stream. Only meaningful alongside ``WSGITelemetryService``; without an ingester to receive them the records have no destination. See :doc:`../configuration-directives/WSGISlowRequests` for the full directive reference. A typical configuration for a single application:: LoadModule wsgi_module modules/mod_wsgi.so # Server-wide: one declaration enables the reporter for every # mod_wsgi process Apache starts, regardless of how many # VirtualHosts or daemon pools the configuration defines. WSGITelemetryService unix:/tmp/mod_wsgi-telemetry.sock interval=1.0 WSGITelemetryOptions +CaptureUserAgent WSGISlowRequests 2.0 WSGIDaemonProcess example processes=2 threads=15 ServerName www.example.com WSGIScriptAlias / /var/www/example/wsgi.py WSGIProcessGroup example WSGIApplicationGroup %{GLOBAL} Require all granted When more than one application is hosted on the same Apache, the ``WSGITelemetryService`` line is still declared once. Each daemon process and each embedded-mode Apache child reports independently; the ingester aggregates them and groups by ``WSGIDaemonProcess`` name in the UI. The socket path is the contract between Apache and the ingester: the same path must appear on both sides. The mod_wsgi processes must be able to ``connect()`` to the socket, which is a separate concern from the path itself, covered under `Socket permissions`_ below. Enabling the reporter with mod_wsgi-express ------------------------------------------- ``mod_wsgi-express`` translates the directives above into the generated ``httpd.conf``. The equivalent command-line options are: ``--telemetry-service TARGET`` Enable the reporter and set the ingester socket. Same ``unix:/path`` form as the directive value. ``--telemetry-interval SECONDS`` Sampling interval (default ``1.0``). Sub-second intervals are permitted; the value must be greater than zero. ``--telemetry-options ARGS`` Capture toggles. The value is passed verbatim to a ``WSGITelemetryOptions`` directive in the generated config, so the ``+Flag`` / ``-Flag`` / ``None`` / ``All`` forms are available. Repeatable; each occurrence emits a separate directive. ``--slow-requests SECONDS`` Slow-request threshold. Requires ``--telemetry-service``; ``mod_wsgi-express`` rejects the option at startup if no telemetry target was given. The equivalent of the manual configuration above:: mod_wsgi-express start-server wsgi.py \ --processes 2 --threads 15 \ --telemetry-service unix:/tmp/mod_wsgi-telemetry.sock \ --telemetry-interval 1.0 \ --telemetry-options "+CaptureUserAgent" \ --slow-requests 2.0 All-in-one shortcut: --enable-telemetry --------------------------------------- For the common single-host case where the ingester runs alongside the WSGI application managed by the same ``mod_wsgi-express`` instance, ``--enable-telemetry`` bundles the moving parts together: * Picks a UNIX socket path inside the server root (``/telemetry.sock``). * Generates a service-script (``/telemetry-service.py``) that invokes ``mod_wsgi.telemetry.server.main(...)`` with that socket path. * Adds a ``WSGIDaemonProcess service:telemetry threads=0`` + ``WSGIImportScript`` pair so Apache supervises the ingester alongside the WSGI workers. * Adds the matching ``WSGITelemetryService`` directive so the WSGI processes report into that socket. * Exposes the web UI on ``127.0.0.1`` at the port given by ``--telemetry-ui-port`` (default ``8888``). Equivalent to the manual example above:: mod_wsgi-express start-server wsgi.py \ --processes 2 --threads 15 \ --enable-telemetry \ --telemetry-interval 1.0 \ --telemetry-options "+CaptureUserAgent" \ --slow-requests 2.0 The web UI is reachable at ``http://127.0.0.1:8888/`` once the server has started; two express instances on the same host need distinct ``--telemetry-ui-port`` values to avoid a bind collision. ``--enable-telemetry`` is mutually exclusive with ``--telemetry-service`` (the auto-generated socket would conflict with an explicit target) and with a user-supplied ``--service-script telemetry ...`` (name clash). Both are rejected at configuration time. The shortcut also requires the ``mod_wsgi-telemetry`` package to be importable; if it is not, ``mod_wsgi-express`` exits with a message naming the dependency. The service-script daemon runs with ``threads=0``, which mod_wsgi treats as "service only": no per-request metrics are accumulated in that process and no telemetry reporter thread is started, so the ingester does not report telemetry about itself to itself. ``Running the ingester`` below covers the manual flow for deployments where the ingester is *not* supervised by ``mod_wsgi-express`` (separate host, separate user, separate release cadence). When the all-in-one shortcut fits, the manual ingester install is not needed. Running the ingester -------------------- Install the ingester from PyPi into a virtual environment:: python3 -m venv /opt/mod_wsgi-telemetry /opt/mod_wsgi-telemetry/bin/pip install mod_wsgi-telemetry Start it on the same host as the mod_wsgi Apache instance, with ``--listen`` pointing at the same UNIX socket that the reporter sends to:: /opt/mod_wsgi-telemetry/bin/mod_wsgi-telemetry serve \ --listen unix:/tmp/mod_wsgi-telemetry.sock A bare ``mod_wsgi-telemetry`` invocation defaults to ``serve``, so the subcommand name can be omitted once the install path is on ``PATH``:: mod_wsgi-telemetry --listen unix:/tmp/mod_wsgi-telemetry.sock The ingester binds the socket itself, so do not run it on a host where another process is already bound to the path. If the socket file is left behind from a previous run it will be removed and recreated. By default the ingester also serves an HTTP + WebSocket interface on ``127.0.0.1:8888`` for the browser UI and the terminal monitor. Override the bind address with ``--http-host`` and the port with ``--http-port``:: mod_wsgi-telemetry serve --http-port 9080 Running the ingester as a long-lived service is the expected deployment shape. A simple systemd unit for the install path above would look like:: [Unit] Description=mod_wsgi telemetry ingester After=network.target [Service] Type=simple User=www-data ExecStart=/opt/mod_wsgi-telemetry/bin/mod_wsgi-telemetry serve \ --listen unix:/tmp/mod_wsgi-telemetry.sock Restart=on-failure [Install] WantedBy=multi-user.target See `Socket permissions`_ below for how the unit should declare the running user and the shared group that gates access to the listen socket. Socket permissions ------------------ The reporter sends datagrams to the UNIX socket using ``sendto(2)``, which requires write permission on the socket file. The ingester creates the file at bind time, owns it as the user the ingester process is running as, and sets it to mode ``0660``: owner and group can write, nobody else. In the simple case where the ingester runs as the same user as every mod_wsgi process that reports, the default is enough. This covers the embedded-mode case where everything is the Apache user (typically ``www-data`` or ``apache``) and there is no separate ``WSGIDaemonProcess user=`` override. The recommended deployment pattern, though, runs each WSGI application as its own user via ``WSGIDaemonProcess user=...``, which means the senders are not all the same user, and probably none of them are the ingester user. The standard Unix solution is a shared group: create one group whose members are every identity that needs to connect, and chown the socket to that group at bind time. The ingester's ``--socket-group`` option does the chown:: groupadd mod-wsgi-telemetry usermod -aG mod-wsgi-telemetry www-data usermod -aG mod-wsgi-telemetry app-user-1 usermod -aG mod-wsgi-telemetry app-user-2 mod_wsgi-telemetry serve \ --listen unix:/tmp/mod_wsgi-telemetry.sock \ --socket-group mod-wsgi-telemetry After bind, the socket is owned by the ingester user with group ``mod-wsgi-telemetry`` and mode ``0660``, so any process running as one of the listed users (and only those) can ``sendto()`` the socket. The Apache and ``WSGIDaemonProcess`` users need to be existing members of the group when Apache starts; group membership is checked at fork time, so adding a user to the group after Apache started has no effect until Apache is restarted. ``--socket-group`` accepts either a group name or a numeric GID. The default mode (``--socket-mode 0660``) lets group members both read and write the socket file; this is the conventional setting. Mode ``0620`` is the tighter alternative: senders only need write, so revoking group read shrinks the privilege footprint at no operational cost. The bind() call itself uses a tight umask (``0077``) so the socket file is briefly ``0600`` before the explicit chmod widens it, closing the small window during which an inherited umask might leave a freshly-created socket world-writable. An equivalent systemd unit:: [Unit] Description=mod_wsgi telemetry ingester After=network.target [Service] Type=simple User=mod-wsgi-telemetry Group=mod-wsgi-telemetry ExecStart=/opt/mod_wsgi-telemetry/bin/mod_wsgi-telemetry serve \ --listen unix:/tmp/mod_wsgi-telemetry.sock \ --socket-group mod-wsgi-telemetry Restart=on-failure [Install] WantedBy=multi-user.target Running the ingester as a dedicated user (``mod-wsgi-telemetry`` above, created with ``useradd --system --no-create-home``) rather than reusing ``www-data`` keeps the ingester out of the Apache user's privilege scope and makes the access-list explicit: every identity that can send telemetry is listed in ``getent group mod-wsgi-telemetry``. The browser UI -------------- Open ``http://127.0.0.1:8888`` in a browser on the same host as the ingester. The page is a single-page application served from the ingester; it opens a WebSocket back to the same port and shows a persistent top bar (totals, process-group filter, marker toggles, connection state) above five tabs: ``Overview`` Live sparkline charts for throughput, capacity utilisation, CPU, response time and memory RSS, with a per-phase mean-time breakdown and an HDR-style latency distribution histogram for the selected phase. ``Capacity`` Per-process worker-slot heatmap. Each row is a process; each cell is one worker slot shaded by busy fraction over the interval, with marker overlays for slots holding a request past a selectable threshold. Hover for the live URL of any slot currently busy, click to open the slow-request drill-down. ``Processes`` Process timeline (Gantt-style bars spanning each process's STARTED-to-STOPPED lifetime, with a tick mark at STOPPING for drain start) and an event log of lifecycle events that links back into the timeline. ``GC`` Per-process Python garbage-collection behaviour, plotted per interpreter when more than one sub-interpreter is hosted in the daemon or embedded child. Four panels: generation pressure (gen0 / gen1 progress towards their collection thresholds, plus the gen2 sawtooth between full sweeps), collections per second per generation, an event timeline showing every cyclic-GC pause as a dot coloured by generation, and an HDR histogram of pause durations over the selected window. Tab-local Process and Interpreter selectors narrow the view to a single process or sub-interpreter, a Window selector (``1m`` / ``2m`` / ``5m``) scopes the time axis, and a Pause control freezes the live display while samples continue to arrive in the background. A status row above the charts shows the configuration values from the most recent snapshot: whether the cyclic collector is enabled, the per-generation thresholds, the current and cumulative collection counts, and the frozen-object count. ``Slow requests`` Live slow-request table with sorting, state filter (active / completed), URL substring search and per-record drill-down. The UI binds to ``127.0.0.1`` by default rather than ``0.0.0.0``. Telemetry data includes details that operators would not normally expose unauthenticated (the live URL stream and User-Agent captures in particular); leaving the bind on loopback ensures the UI is reachable only from the host itself. Terminal monitor ---------------- The same data is available as a curses-based terminal monitor for hosts where opening a browser is impractical (SSH-only servers, sandboxed deployment shapes, scripted health checks). The monitor is a separate subcommand of the same binary:: mod_wsgi-telemetry top It connects to a running ingester's WebSocket by default at ``ws://127.0.0.1:8888/ws``. Override with ``--url`` to connect to an ingester on a different host or port (combine with the SSH tunnel pattern below to monitor a remote host without exposing the UI externally). The monitor renders the same underlying data as the browser UI but with a layout tuned to the terminal. Five views are switchable by single keystroke (``o``, ``p``, ``w``, ``l``, ``s`` or the digits ``1`` to ``5``): ``overview`` Sparklines for throughput, capacity, CPU and resident memory, plus a summary of per-phase mean times. ``processes`` A per-process table sortable by throughput, CPU, memory, ``p95``, slow-request count, or PID. ``workers`` A per-process slot grid showing which worker threads are idle, busy with short requests, busy with longer requests, or holding a request past the slow-request threshold. ``latency`` An ASCII HDR histogram for the selected phase, with ``p50`` / ``p95`` / ``p99`` markers. ``slow`` A live slow-request list with sorting, state filtering and a URL substring search. Common keys: ``space`` to pause/resume, ``q`` to quit, ``?`` for an in-monitor help overlay listing the full set. For scripted use, ``--once`` renders a single plain-text snapshot of the header and process table to stdout and exits; the exit code is ``0`` if a snapshot was received and ``2`` if the connection attempt timed out. This makes the monitor usable as a healthcheck or a shell-pipeline data source. Accessing the UI from a remote host ----------------------------------- The ingester binds to ``127.0.0.1`` on purpose. The two safe ways to reach it from elsewhere are an SSH tunnel and an authenticated reverse proxy. SSH tunnel ~~~~~~~~~~ The simplest option for an operator with shell access to the host is an SSH local port forward. From the operator's workstation:: ssh -L 8888:127.0.0.1:8888 user@host.example.com The browser then connects to ``http://localhost:8888`` on the local workstation. The forward stays up for the lifetime of the SSH session and tears down cleanly when the session ends. The ingester does not need any reconfiguration and there is no network exposure on the remote host. For the terminal monitor running on the operator's workstation, point ``--url`` at the forwarded port:: mod_wsgi-telemetry top --url ws://localhost:8888/ws Apache reverse proxy with basic authentication ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ When the UI needs to be reachable to a small group of operators without each one running an SSH tunnel, an authenticated reverse proxy in the same Apache instance that hosts the WSGI application is a practical option. The configuration is a standard ``mod_proxy`` mount paired with an ``AuthType Basic`` block:: ProxyPass http://127.0.0.1:8888/ upgrade=websocket ProxyPassReverse http://127.0.0.1:8888/ AuthType Basic AuthName "mod_wsgi telemetry" AuthUserFile /etc/apache2/telemetry.htpasswd Require valid-user The ``upgrade=websocket`` parameter on ``ProxyPass`` is what makes the live data stream work: the browser UI opens a WebSocket back to the ingester for live updates, and ``mod_proxy_http`` handles the protocol upgrade in place. This requires Apache 2.4.47 or newer; on older versions the same effect needs an explicit ``ProxyPass`` line using the ``ws://`` scheme and the ``mod_proxy_wstunnel`` module. Create the password file with ``htpasswd``:: htpasswd -c /etc/apache2/telemetry.htpasswd alice For the manually-configured Apache the proxy block goes alongside the WSGI mount. For ``mod_wsgi-express``, the equivalent shape uses ``--proxy-mount-point`` to add the proxy mount to the generated configuration:: mod_wsgi-express start-server wsgi.py \ --telemetry-service unix:/tmp/mod_wsgi-telemetry.sock \ --proxy-mount-point /telemetry/ http://127.0.0.1:8888/ \ --include-file /etc/apache2/telemetry-auth.conf The ``--include-file`` points at a small fragment with the ``AuthType Basic`` block (``mod_wsgi-express`` has no dedicated option for HTTP basic auth, so the fragment supplies the directives directly). The fragment:: AuthType Basic AuthName "mod_wsgi telemetry" AuthUserFile /etc/apache2/telemetry.htpasswd Require valid-user See :doc:`running-behind-a-reverse-proxy` for the broader conventions around mod_proxy and forwarded-header trust; the mechanics there apply identically when Apache is the proxy in front of the telemetry ingester rather than the WSGI back-end. .. warning:: Each open browser tab on the UI holds a long-lived WebSocket connection that occupies one Apache worker thread for its entire lifetime. With the worker MPM's default ``ThreadsPerChild`` and even with the event MPM (which still commits one thread per upgraded connection), a handful of open tabs is fine but a wide audience is not: leaving the dashboard pinned across an organisation can starve real request-serving capacity. The reverse-proxy pattern is intended for a small operator group, not a public dashboard. For larger audiences, run a dedicated Apache instance for the telemetry UI on its own port, with its own MPM sizing, so its long-lived connections cannot affect the application's request-serving capacity. Where to go next ---------------- * :doc:`internal-metrics-api` for the in-process accessor API that is the alternative to the external service. Choose the external service when you want an out-of-the-box live UI; choose the internal API when the application should own the metrics destination. * :doc:`running-behind-a-reverse-proxy` for the trust mechanics and proxy configuration patterns that apply to any HTTP-level proxying in front of mod_wsgi, including the telemetry UI. * :doc:`mod-wsgi-express-quickstart` for the ``mod_wsgi-express`` options that surround ``--telemetry-service`` in a real invocation. * Directive references: :doc:`../configuration-directives/WSGITelemetryService`, :doc:`../configuration-directives/WSGITelemetryOptions`, and :doc:`../configuration-directives/WSGISlowRequests` for the per-directive syntax and validation rules.