Delegating File Serving To Apache
When a WSGI application needs to return the contents of a file, the most
efficient option is almost always to let the web server serve the file
rather than have the application read and yield it itself. With mod_wsgi
this means letting Apache do the work. The File Wrapper Extension
covers one way to do that from within the application. This page covers
the other way, available in daemon mode, where the application hands the
response off to Apache entirely by returning a Location response
header.
The typical use case is serving a file that the application must first
make an access control decision about. The application performs the
authentication and authorisation, then, instead of streaming the file
back through Python, it tells Apache which local URL should be served in
its place. Apache serves that URL using its own static file handler,
with all the optimisations that brings, including sendfile(), range
requests, conditional requests and correct cache validators.
This is the same mechanism that nginx exposes through the
X-Accel-Redirect header, and it is a built-in, supported alternative
to the third-party mod_xsendfile module.
How it works
In daemon mode mod_wsgi acts as a gateway in front of the daemon
process, in the same way that mod_cgid acts as a gateway in front of
a CGI script. As part of that role it honours the CGI specification’s
local redirect response (RFC 3875). If the WSGI application returns a
response whose status is 200 and which includes a Location
response header whose value is a local URL path, that is, a value
beginning with a / and with no scheme or host name, then mod_wsgi
does not send the response to the client. Instead it discards any
response content from the application and performs an Apache internal
redirect to the nominated URL. The client receives the response produced
for that URL, and never sees the Location header or any redirect.
Because the internal redirect is processed by Apache as a fresh request, the target URL passes through Apache’s normal request processing, including its access control and its static file handler. This is what makes it possible to let Apache serve a file the application has just authorised.
The WSGI application
The application returns an empty response body, a status of 200, and
a Location header giving the local URL that Apache should serve in
its place:
def application(environ, start_response):
# Perform whatever authentication and authorisation checks are
# required before allowing the file to be served.
if not user_is_allowed(environ):
start_response('403 Forbidden', [('Content-Type', 'text/plain')])
return [b'Forbidden']
status = '200 OK'
response_headers = [('Location', '/private/report.pdf')]
start_response(status, response_headers)
return []
Note that the value of the Location header must be a local URL path,
not a file system path. It names a URL that Apache can resolve, and that
URL is what determines which file is ultimately served.
The Apache configuration
The private files are mapped to a URL using Alias (or any other
mechanism that makes them available as a URL), and the directory is made
servable in the usual way:
Alias /private/ /path/to/app/private/
<Directory /path/to/app/private/>
Require all granted
</Directory>
As written this would also allow a client to request /private/ URLs
directly, bypassing the application’s access control. To prevent that,
direct client requests for the private area must be blocked while still
allowing requests that arrive by way of the internal redirect:
RewriteEngine On
# Block direct client requests for the private area. THE_REQUEST is
# the original request line as received from the client and is not
# changed by mod_wsgi's internal redirect. It therefore only matches
# a request the client made directly, not one that arrived via a
# Location response header from the WSGI application.
RewriteCond %{THE_REQUEST} ^\S+\s/private/
RewriteRule ^/private/ - [F]
With this in place, a client that asks for /private/report.pdf
directly is refused with a 403 response, while a request that the
application redirects to that same URL is served normally.
Why not IS_SUBREQ
It is tempting to reach for the IS_SUBREQ rewrite variable to
distinguish an internally generated request from a direct one, but it is
the wrong tool here. mod_wsgi performs an internal redirect, which Apache
processes as a redirected main request, not as a subrequest.
IS_SUBREQ only reports true for subrequests, so it is false for the
redirected request and cannot be used to gate access to it. Matching
against THE_REQUEST as shown above is the reliable approach because
that variable always reflects what the client actually sent.
Behaviour and limitations
There are a number of things to be aware of when using this mechanism.
It is only available in daemon mode. In embedded mode a
Locationheader is treated as an ordinary response header.The response status must be
200for the internal redirect to be triggered. A status such as302is treated as a normal client redirect and is sent to the client as is.The
Locationvalue must be a local URL path. If it is an absolute URL, with a scheme and host name, mod_wsgi passes the response through unchanged rather than redirecting. This differs from the CGI specification, where an absoluteLocationwith a200status is turned into a client redirect; mod_wsgi does not do that, on the basis that a WSGI application that wants a client redirect sets the status itself.The internal redirect is always performed as a
GETrequest, regardless of the method of the original request, and any request content already read is not made available to it. The target URL is therefore expected to be a resource that can be served by aGET.Because the target URL is processed as a normal request, Apache’s access control applies to it. This is why the private area must be protected as shown above; making it servable by Apache also makes it reachable by clients unless direct access is explicitly blocked.
Relationship to the file wrapper extension
This mechanism and the File Wrapper Extension solve the same underlying problem, returning file contents efficiently, but in different ways and with different trade-offs.
The file wrapper extension keeps the application in control of which file is returned and works in both embedded and daemon mode, but the file contents still pass from the application to Apache, and any WSGI middleware in the stack can defeat the optimisation by consuming or wrapping the response.
Delegating to Apache with a Location header hands the work to Apache
completely, so the file contents never pass through Python at all and no
middleware can interfere with how they are served. The cost is that it is
daemon mode only and that the file must be reachable as a URL within the
Apache configuration, which in turn must be protected against direct
access as described above.