Delegating File Serving To Apache

When a WSGI application needs to return the contents of a file, the most efficient option is almost always to let the web server serve the file rather than have the application read and yield it itself. With mod_wsgi this means letting Apache do the work. The File Wrapper Extension covers one way to do that from within the application. This page covers the other way, available in daemon mode, where the application hands the response off to Apache entirely by returning a Location response header.

The typical use case is serving a file that the application must first make an access control decision about. The application performs the authentication and authorisation, then, instead of streaming the file back through Python, it tells Apache which local URL should be served in its place. Apache serves that URL using its own static file handler, with all the optimisations that brings, including sendfile(), range requests, conditional requests and correct cache validators.

This is the same mechanism that nginx exposes through the X-Accel-Redirect header, and it is a built-in, supported alternative to the third-party mod_xsendfile module.

How it works

In daemon mode mod_wsgi acts as a gateway in front of the daemon process, in the same way that mod_cgid acts as a gateway in front of a CGI script. As part of that role it honours the CGI specification’s local redirect response (RFC 3875). If the WSGI application returns a response whose status is 200 and which includes a Location response header whose value is a local URL path, that is, a value beginning with a / and with no scheme or host name, then mod_wsgi does not send the response to the client. Instead it discards any response content from the application and performs an Apache internal redirect to the nominated URL. The client receives the response produced for that URL, and never sees the Location header or any redirect.

Because the internal redirect is processed by Apache as a fresh request, the target URL passes through Apache’s normal request processing, including its access control and its static file handler. This is what makes it possible to let Apache serve a file the application has just authorised.

The WSGI application

The application returns an empty response body, a status of 200, and a Location header giving the local URL that Apache should serve in its place:

def application(environ, start_response):
    # Perform whatever authentication and authorisation checks are
    # required before allowing the file to be served.

    if not user_is_allowed(environ):
        start_response('403 Forbidden', [('Content-Type', 'text/plain')])
        return [b'Forbidden']

    status = '200 OK'
    response_headers = [('Location', '/private/report.pdf')]
    start_response(status, response_headers)

    return []

Note that the value of the Location header must be a local URL path, not a file system path. It names a URL that Apache can resolve, and that URL is what determines which file is ultimately served.

The Apache configuration

The private files are mapped to a URL using Alias (or any other mechanism that makes them available as a URL), and the directory is made servable in the usual way:

Alias /private/ /path/to/app/private/

<Directory /path/to/app/private/>
    Require all granted
</Directory>

As written this would also allow a client to request /private/ URLs directly, bypassing the application’s access control. To prevent that, direct client requests for the private area must be blocked while still allowing requests that arrive by way of the internal redirect:

RewriteEngine On

# Block direct client requests for the private area. THE_REQUEST is
# the original request line as received from the client and is not
# changed by mod_wsgi's internal redirect. It therefore only matches
# a request the client made directly, not one that arrived via a
# Location response header from the WSGI application.

RewriteCond %{THE_REQUEST} ^\S+\s/private/
RewriteRule ^/private/ - [F]

With this in place, a client that asks for /private/report.pdf directly is refused with a 403 response, while a request that the application redirects to that same URL is served normally.

Why not IS_SUBREQ

It is tempting to reach for the IS_SUBREQ rewrite variable to distinguish an internally generated request from a direct one, but it is the wrong tool here. mod_wsgi performs an internal redirect, which Apache processes as a redirected main request, not as a subrequest. IS_SUBREQ only reports true for subrequests, so it is false for the redirected request and cannot be used to gate access to it. Matching against THE_REQUEST as shown above is the reliable approach because that variable always reflects what the client actually sent.

Behaviour and limitations

There are a number of things to be aware of when using this mechanism.

  • It is only available in daemon mode. In embedded mode a Location header is treated as an ordinary response header.

  • The response status must be 200 for the internal redirect to be triggered. A status such as 302 is treated as a normal client redirect and is sent to the client as is.

  • The Location value must be a local URL path. If it is an absolute URL, with a scheme and host name, mod_wsgi passes the response through unchanged rather than redirecting. This differs from the CGI specification, where an absolute Location with a 200 status is turned into a client redirect; mod_wsgi does not do that, on the basis that a WSGI application that wants a client redirect sets the status itself.

  • The internal redirect is always performed as a GET request, regardless of the method of the original request, and any request content already read is not made available to it. The target URL is therefore expected to be a resource that can be served by a GET.

  • Because the target URL is processed as a normal request, Apache’s access control applies to it. This is why the private area must be protected as shown above; making it servable by Apache also makes it reachable by clients unless direct access is explicitly blocked.

Relationship to the file wrapper extension

This mechanism and the File Wrapper Extension solve the same underlying problem, returning file contents efficiently, but in different ways and with different trade-offs.

The file wrapper extension keeps the application in control of which file is returned and works in both embedded and daemon mode, but the file contents still pass from the application to Apache, and any WSGI middleware in the stack can defeat the optimisation by consuming or wrapping the response.

Delegating to Apache with a Location header hands the work to Apache completely, so the file contents never pass through Python at all and no middleware can interfere with how they are served. The cost is that it is daemon mode only and that the file must be reachable as a URL within the Apache configuration, which in turn must be protected against direct access as described above.