Access Control Mechanisms
This document contains information about mechanisms available in mod_wsgi for controlling who can access a WSGI application. This includes coverage of support for HTTP Basic and Digest authentication mechanisms, as well as server side mechanisms for authorisation and host access control.
HTTP User Authentication
The HTTP protocol supports user authentication mechanisms for clients through the ‘Authorization’ header. The two main examples for this are the Basic and Digest authentication mechanisms.
Unlike other HTTP headers, the authorisation header is not passed through to a WSGI application by default. This is the case as doing so could leak information about passwords through to a WSGI application which should not be able to see them when Apache is performing authentication.
If Apache is performing authentication, a WSGI application can still find
out what type of authentication scheme was used by checking the variable
AUTH_TYPE of the WSGI application environment. The login name of the
authorised user can be determined by checking the variable
REMOTE_USER.
If it is desired that the WSGI application be responsible for handling user authentication, then it is necessary to explicitly configure mod_wsgi to pass the required headers through to the application. This can be done by specifying the WSGIPassAuthorization directive in the appropriate context and setting it to ‘On’.
When passing of authorisation information is enabled, the authorisation
headers are passed through to a WSGI application in the
HTTP_AUTHORIZATION variable of the WSGI application environment when
the equivalent HTTP request header is present. You will still need to
provide your own code to process the header and perform the required hand
shaking with the client to indicate whether the client is permitted access.
Reflecting Application Level Authentication Back to Apache
When a WSGI application performs its own authentication, the result is
visible only inside the application. Apache itself does not know who the
request was authenticated as, so r->user and r->ap_auth_type remain
unset. The practical consequences are:
The
%uplaceholder inLogFormatrecords the literal-rather than the authenticated user name, andmod_log_forensicand similar modules see no user either.Authorisation directives that need to evaluate against an identity, such as
Require userandRequire group, have nothing to evaluate.
A small amount of glue can bridge this. The WSGI application emits the
authenticated user name and authentication scheme as response headers,
and a short mod_lua hook running in the logging phase copies the
values into r->user and r->ap_auth_type and strips the headers so
they are never sent to the HTTP client.
mod_lua is part of the standard Apache distribution, although on some
platforms it is packaged as a separately installable module. Once it is
loaded, register a logging hook against a small Lua script:
LoadModule lua_module modules/mod_lua.so
LuaHookLog /etc/apache2/lua/wsgi-auth-reflect.lua reflect_auth
The script wsgi-auth-reflect.lua contains:
function reflect_auth(r)
local user = r.headers_out['X-Remote-User']
if user then
r.user = user
r.headers_out['X-Remote-User'] = nil
end
local auth_type = r.headers_out['X-Auth-Type']
if auth_type then
r.ap_auth_type = auth_type
r.headers_out['X-Auth-Type'] = nil
end
return apache2.OK
end
The hook runs at the start of the logging phase, after the content
handler has produced the response but before mod_log_config writes
the access log entry, so %u records the user that the WSGI
application authenticated.
The WSGI application includes the two headers in the response when it has authenticated the request:
def application(environ, start_response):
user = authenticate(environ) # application specific
headers = [('Content-Type', 'text/html; charset=utf-8')]
if user is not None:
headers.append(('X-Remote-User', user))
headers.append(('X-Auth-Type', 'Bearer'))
start_response('200 OK', headers)
return [b'...']
The response headers are stripped only if the Lua hook is wired up
correctly. If the hook is not in place, the application leaks the
authenticated user name to the client, so the application should be
deployed together with the matching LuaHookLog configuration.
This bridge affects only what Apache logs and what downstream modules
observe at log time. It does not reactivate the Apache authorisation
phase, so it is not a substitute for Require user based access
control. Where Apache itself needs to make access decisions based on
the authenticated identity, the Apache authentication provider
mechanism described below should be used instead.
Apache Authentication Provider
Apache implements the concept of authentication providers. That is, Apache implements the hand shaking with the client for authentication mechanisms such as Basic and Digest. All that the user server side code needs to provide is a means of authenticating the actual credentials of the user trying to gain access to the site.
This greatly simplified the implementation of client authentication as the hand shaking for a particular authentication mechanism was implemented only once in Apache and it wasn’t necessary for each authentication module to duplicate it. This was particularly good for the Digest authentication mechanism which was non trivial to implement correctly.
The WSGIAuthUserScript directive can be used to define a Python script file containing code which performs the authenticating of user credentials as outlined.
The required Apache configuration for defining the authentication provider for Basic authentication would be:
AuthType Basic
AuthName "Top Secret"
AuthBasicProvider wsgi
WSGIAuthUserScript /usr/local/wsgi/scripts/auth.wsgi
Require valid-user
The ‘auth.wsgi’ script would then need to contain a ‘check_password()’ function with a sample as shown below:
def check_password(environ, user, password):
if user == 'spy':
if password == 'secret':
return True
return False
return None
This function should validate that the user exists in the user database and that the password is correct. If the user does not exist at all, then the result should be ‘None’. If the user does exist, the result should be ‘True’ or ‘False’ depending on whether the password was valid.
As an alternative to returning True, a non-empty string can be returned
to indicate that authentication has succeeded. The returned string is then
used as the authenticated user name in place of the value supplied by the
client — it becomes REMOTE_USER for the WSGI application, and is also
what Apache records for the request in access logs and in any subsequent
group authorisation checks. This is useful where the supplied user name
needs canonicalising, or where an external credential (an email address,
an LDAP attribute, a certificate subject) maps to a different internal
user name.
HTTP Digest authentication is uncommon in modern deployments — TLS plus HTTP Basic authentication, or application-level authentication, is generally preferred. Digest support is documented here for completeness.
If wishing to use Digest authentication, the configuration would instead be:
AuthType Digest
AuthName "Top Secret"
AuthDigestProvider wsgi
WSGIAuthUserScript /usr/local/wsgi/scripts/auth.wsgi
Require valid-user
The name of the required authentication function for Digest authentication is ‘get_realm_hash()’. The result of the function must be ‘None’ if the user doesn’t exist, or a hash string encoding the user name, authentication realm and password:
import hashlib
def get_realm_hash(environ, user, realm):
if user == 'spy':
# user:realm:password
data = ('%s:%s:%s' % (user, realm, 'secret')).encode('UTF-8')
return hashlib.md5(data).hexdigest()
return None
By default the auth providers are executed in the context of the main Python interpreter, ie., ‘%{GLOBAL}’ and always in the Apache child processes, never in a daemon process. The interpreter can be overridden using the ‘application-group’ option to the script directive. The namespace for authentication groups is shared with that for application groups defined by WSGIApplicationGroup.
Because the auth provider is always run in the Apache child processes and never in the context of a mod_wsgi daemon process, if the authentication check is making use of the internals of some Python web framework, it is recommended that the application using that web framework also be run in embedded mode and the same application group. This is the case as the Python web frameworks often bring in a huge amount of code even if using only one small part of them. This will result in a lot of memory being used in the Apache child processes just to support the auth provider.
An aliased auth provider can also be defined using the
<AuthnProviderAlias> directive (provided by mod_authn_core, which
is loaded by default on Apache 2.4):
<AuthnProviderAlias wsgi django>
WSGIAuthUserScript /usr/local/django/mysite/apache/auth.wsgi \
application-group=django
</AuthnProviderAlias>
WSGIScriptAlias / /usr/local/django/mysite/apache/django.wsgi
<Directory /usr/local/django/mysite/apache>
Require all granted
WSGIApplicationGroup django
AuthType Basic
AuthName "Django Site"
AuthBasicProvider django
Require valid-user
</Directory>
An authentication script for Django might then be something like:
import os, sys
sys.path.append('/usr/local/django')
os.environ['DJANGO_SETTINGS_MODULE'] = 'mysite.settings'
from django.contrib.auth.models import User
from django import db
def check_password(environ, user, password):
db.reset_queries()
kwargs = {'username': user, 'is_active': True}
try:
try:
user = User.objects.get(**kwargs)
except User.DoesNotExist:
return None
if user.check_password(password):
return True
else:
return False
finally:
db.connection.close()
For both Basic and Digest authentication providers, the ‘environ’ dictionary passed as first argument is a cut down version of what would be supplied to the actual WSGI application. This includes the ‘wsgi.errors’ object for the purposes of logging error messages associated with the request.
Any configuration defined by SetEnv directives is not passed in the ‘environ’ dictionary because doing so would allow users to override the configuration specified in such a way from a ‘.htaccess’ file. Configuration should as a result be placed into the script file itself.
The benefit of using the Apache authentication provider mechanism rather than the WSGI application doing it all itself, is that it can be used to control access to a number of WSGI applications at the same time as well as static files or dynamic pages implemented by other Apache modules using other programming languages such as PHP or Perl. The mechanism could even be used to control access to CGI scripts.
Host Access Controls
The authentication provider and group authorisation features help to control access based on the identity of a user. It is also possible to limit access based on the machine which the client is connecting from. The path to the script is defined using the WSGIAccessScript directive:
WSGIAccessScript /usr/local/wsgi/script/access.wsgi
The name of the function that must exist in the script file is ‘allow_access()’. It must return True or False:
def allow_access(environ, host):
return host in ['localhost', '::1']
The ‘environ’ dictionary passed as first argument is a cut down version of what would be supplied to the actual WSGI application. This includes the ‘wsgi.errors’ object for the purposes of logging error messages associated with the request.
Any configuration defined by SetEnv directives is not passed in the ‘environ’ dictionary because doing so would allow users to override the configuration specified in such a way from a ‘.htaccess’ file. Configuration should as a result be placed into the script file itself.
By default the access checking code is executed in the context of the main Python interpreter, ie., ‘%{GLOBAL}’, and always in the Apache child processes, never in a daemon process. The interpreter used can be overridden using the ‘application-group’ option to the script directive.
Using mod_wsgi-express
When running under mod_wsgi-express, the three script
directives covered above are configured through dedicated
command-line options. Each option emits the underlying
directive together with the surrounding AuthType /
AuthBasicProvider / Require block needed to activate
it under a sitewide <Location />, so the only argument you
need to supply is the script path itself.
--host-access-script SCRIPT-PATHEmits a
WSGIAccessScriptdirective. The named script must defineallow_access()as described in Host Access Controls.--auth-user-script SCRIPT-PATHEmits a
WSGIAuthUserScriptdirective together with theAuthType,AuthName,Auth<scheme>Provider wsgiandRequire valid-userdirectives needed to make it active.--auth-type TYPESelects the authentication scheme:
Basic(the default) orDigest. The script function the named script must define is determined by this choice (check_password()for Basic,get_realm_hash()for Digest), as described in Apache Authentication Provider.--auth-group-script SCRIPT-PATHEmits a
WSGIAuthGroupScriptdirective together with aRequire wsgi-groupdirective. The named script must definegroups_for_user()as described in Apache Group Authorisation. Group authorisation is layered on top of user authentication, so this option is only meaningful in combination with--auth-user-script.--auth-group NAMEGroup name used in the generated
Require wsgi-groupdirective. Defaults towsgias a placeholder; override this to match the actual group name returned by yourgroups_for_user()function.
A typical invocation combining user authentication and group authorisation:
mod_wsgi-express start-server wsgi.py \
--auth-user-script /srv/myapp/auth.wsgi \
--auth-group-script /srv/myapp/auth.wsgi \
--auth-group secret-agents
A single script file can supply both check_password() and
groups_for_user(), in which case --auth-user-script and
--auth-group-script point at the same path. The scripts
run in the Apache child processes, with the same embedded-mode
caveats about framework loading costs described in the
sections above.