next up previous contents
Next: 6. Service packaging and Up: manual Previous: 4. Writing Browser Interfaces   Contents

Subsections


5. Writing new Clarens modules

5.1 Introduction

Writing a Clarens server-side module runs the gamut from trivial to complex. Complexity mainly stems from the multi-process architecture of the Apache server, dealing with persistent database connections, and of course security considerations.

No specific Python programming experience is required, following and modifying the example should be sufficient for most purposes. For more information on interacting with Apache in this, environment see the mod_python[6] documentation.

5.2 First Example: the echo module

This module has only one method, echo.echo which simply returns its arguments. The module is a standard Python module script, stored in $clarens-toplevel/echo/__init__.py where it will be picked up by the Clarens server automatically when any method in the module is invoked.

5.2.1 Starting out

Like most Python modules, some secondary modules need to be imported first:

PYTHON
import sys
from mod_python import apache
import clarens_util


This imports the sytem and clarens_util modules, as well as the Apache part of mod_python

5.2.2 Defining a new method

A method inside the module is a Python <em>function</em>:

PYTHON
def echo(req,method_name,args):


This define the echo function, with its arguments:

Next, all methods should be documented so that the can be discovered by remote clients:

PYTHON
  """Returns the method argument"""


For our simple method, we construct a response and write the response to the client:

PYTHON
  response=clarens_util.build_response(req,method_name,args)
  req.write(response)


Note the Python language structure of indenting the contents of the function. Finally, return from the function:

PYTHON
  return apache.OK


This lets the Apache server continue with its processing chain (including possibly compressing the output).


5.2.3 Exposing the method

In order to allow methods in the script to be hidden, only those functions specifically exposed to the outside world can be called by clients. Publishing information about the function is done as follows:

PYTHON
methods_list={'echo':echo}
methods_sig= {'echo':['string,string',
                     'int,int',
                     'double,double',
                     'boolean,boolean',
                     'array,array',
                     'struct,struct']}


The methods_list variable is a dictionary, with a string, 'echo', identifying the method, and the callable method object that we defined ealier as data.

The methods_sig is another dictionary that describes the echo method signature, with its data being a list of possible arguments and return values. Each list element is a comma-separated string, with the first value being the type of thereturn value, and the folowing values are the types of the arguments. E.g.:

['string,string']

is a list with one element for a method that takes a string as argument, and returns a string.

In the case of the echo method is polymorphic, and each argument type is the same as the return type.

5.2.4 The full example

The full example would then look like this:

PYTHON
from mod_python import apache
import clarens_util

def echo(req,method_name,args):
    """Returns the method argument"""
    response=clarens_util.build_response(req,method_name,args)
    req.write(response)
    return apache.OK

methods_list={'echo':echo}
methods_sig= {'echo':['string,string',
                     'int,int',
                     'double,double',
                     'boolean,boolean',
                     'array,array',
                     'struct,struct']}



5.3 Debugging

There are two ways of debugging server modules: traditional printf-style messages that are recorded in the log file, or using the command line Python debugger.


5.3.1 Python debugger

Put the directive
PythonEnablePdb On
in the configuration file
$opkg_root/etc/clarens-config/httpd/clarens-server-default.conf.

Then start the Apache server with only one process:
$opkg_root/sbin/httpd -X -f /opt/openpkg/etc/apache2/httpd.conf
Any requests to mod_python handler will cause the Python debugger prompt to appear on the terminal where the server was started from.

Remember to remove the PythonEnablePdb On again when debugging is finished, otherwise an error will be reported by the server as follows:
Handler 'clarens_server' returned invalid return code.

5.3.2 Automatic tracebacks

As of version 0.6.9 of the clarens-server package, any exceptions generated by the server during the execution of the server-side module code above will be reported to the client. The amount of information returned depends on the value of the PythonDebug directive in the server configuration. The value of this directive can be set using the clarens-server-config utility described in section 2.3.2.

If debugging is turned on a traceback of the code, along with the called identity and client machine IP address is sent back. Consider the code snippet below, which uses the standard Python xmlrpclib module:

PYTHON
try:
  dbsvr.newservice.amethod("example")
except xmlrpclib.Fault,v:
  print v.faultString


The code tries to call the newservice.amethod server-side method. Imagine that the code that implements the module contains generates a division by zero error. In that case the above code could print something like the following:

output
Error in method call newservice.amethod made by
/DC=org/DC=doegrids/OU=People/CN=Conrad Steenberg 178947 from IP 127.0.0.1
Traceback (most recent call last):
  File "/opt/openpkg/share/apache2/clarens/system/__init__.py", line 1712,
in exec_method
    return method_object(req,method,args)
  File "/opt/openpkg/share/apache2/clarens/newservice/__init__.py", line 33,
in amethod
    sys.stderr.write(1/0)
ZeroDivisionError: integer division or modulo by zero


This would give the service developer engough information to know that there is a problem in line 33 of the code for the newservice module.

The raw XML message looks like this:

XML
<?xml version="1.0"?>
<methodResponse>
        <fault>
                <value>
                        <struct>
                                <member>
                                        <name>faultCode</name>
                                        <value><int>400</int></value>
                                </member>
                                <member>
                                        <name>faultString</name>
                                        <value><string>
Error in method newmodule.amethod made by
/DC=org/DC=doegrids/OU=People/CN=Conrad Steenberg 178947 from IP 127.0.0.1
Traceback (most recent call last):
  File "/opt/openpkg/share/apache2/clarens/system/__init__.py", line 1712,
in exec_method
    return method_object(req,method,args)
  File "/opt/openpkg/share/apache2/clarens/newmodule/__init__.py", line 33,
in amethod
    sys.stderr.write(1/0)
ZeroDivisionError: integer division or modulo by zero
</string></value>
                                </member>
                        </struct>
                </value>
        </fault>
</methodResponse>


If the PythonDebug directive is turned off, the output will look like this:

output
integer division or modulo by zero


This is obviously not very helpful for the developer, but does hide some information from any potential attackers.

5.4 Handling exceptions with build_fault

It is also possible for the service to generate its own fault responses using the clarens_util.build_fault method inside a try - except clause:

PYTHON
    try:
      response=clarens_util.build_response(req,method_name,args)
    except:
      response=clarens_util.build_fault(req,method_name,
                apache.HTTP_BAD_REQUEST,
               "Bad request echo %s"%(mod_name,args))
    req.write(response)
    return apache.OK


The arguments of the method is build_fault(req,method_name,error_code,error_string). Always return apache.OK from the function, otherwise Apache will generate its own error message in text/html format, which may not be handled elegantly by all clients. If you do not want to supply your own error handling code, the Clarens server will also catch exceptions, and send a generic error message to the client.

5.5 Service initialization

The usual mode of operation for the mod_python module is for so-called handlers to be called once a request is received by the server that matches the requirements set in the configuration file. This may mean that the Python module that implements the handler is only imported once such a request is received.

The Clarens server augments this behaviour by also having the service modules be imported upon server startup to allow some initilization to be done if needed. Some examples of this includes populating a database of known services and methods, starting a process that advertises the services offered using the a discovery service, and making sure the file and method access control lists are loaded in the database for quicker access.

The logical way for services to initialize global variables, database connections etc. would be to put the initilization code in the module's global name space, so that it can be executed when the module is imported. The reality is that each module may be imported multiple times, e.g. by the main Clarens server as well as by other server modules.

To handle this in a more coordinated fashion, the main Clarens server will try to import all the modules it knows about once per process. It will then call a method named _startup_init in each module with three arguments:

5.5.1 Example

An example _startup_init method might look like this:

PYTHON
def _startup_init(config, modnames, modules):
  dbdir=path_join([clarens_config.config['clarens_path'],".clarens_logins"])
  names=["db_file_indiv","db_file_group","md5db","sha1db"]
  clarens_util.register_open_dbs(clarens_util.dbd, names, dbdir)


See section 5.6 below for an explanation of what the above code achieves.

5.5.2 Per server initialization

The above initilization code will be called once per process, which is quite useful for database connections that cannot be shared between processes. In many cases the initilization code actually need to be called only once when the server starts up. The ACL database update is a good example of that.

In that case it is prudent to protect the initilization code with a global lock that precludes concurrent access by multiple processes. This can be achieved in several ways, one of which is to lock files. This method is fairly portable and has the desirable property that such locks are released when a process terminates, reducing the probability of a deadlock occurring.

The code in the example above may be protected with such a lock as follows:

PYTHON
# Global variable
lockfile

def _startup_init(config, modnames, modules):
  dbdir=path_join([clarens_config.config['clarens_path'],".clarens_logins"])
  names=["db_file_indiv","db_file_group","md5db","sha1db"]
  clarens_util.register_open_dbs(clarens_util.dbd, names, dbdir)

  try:
    lockfile=open(path_join([config['clarens_path'],'file','.lockfile']),"w")
    fcntl.flock(lockfile,fcntl.LOCK_EX|fcntl.LOCK_NB)
  except:
    return

  # Now update ACLs once per server startup
  update_acls(clarens_util.dbd["db_file_indiv"])


The ACL initilization code will only be called once per server startup. Of course a real implementation would also add some warnings to the exception handler code to warn of other exceptions that may occur when an attempt is made to open the lockfile.

Note that in the above code we do not close the lockfile object so that the lock is held for the entire time while the process is running.


5.6 Persistent data storage using tdb

It is often useful for services to store data in an organized way that may not quite need to full power of a RDBMS, but would be tedious to implement using flat files.

Clarens provides a high-performance key-value datastore for this purpose, which is also used to store most of the server's internal data structures. The tdb database stores key-value mapping per file, with only one open connection per file possible for a single process. For this reason opening and closing database handles need to be done in a coordinated fashion to prevent deadlocks and server process crashes.

5.6.1 Opening database handles

The clarens_util module provides a method for opening and maintaining a registry of tdb database instances. The method register_dbs should be called with three arguments:

PYTHON
import clarens_util
import clarens_config

dbdir=path_join([clarens_config.config['clarens_path'],".clarens_logins"])
names=["db_file_indiv","db_file_group","md5db","sha1db"]
clarens_util.register_open_dbs(clarens_util.dbd, names, dbdir)

md5keys=clarens_util.dbd["md5db"].keys()


This example starts by importing two supporting modules that contain the utility methods and main configuration respectively.

5.6.2 Standard databases

The system module sets up certain standard databases with session data. These databases are available in the req.clarens dictionary:

  1. db_env - the Berkeley db base handle, use this to create new databases
  2. db_logins - key: user_nonce value
    value: list of server_nonce (server connection ID), connection startup time in seconds since 1970 (float), and the remote IP address req.connection.remote_ip The data gets serialized into the object by doing:

    PYTHON
    req.clarens["db_logins"][user_nonce]=
       pickle.dumps([public_server_nonce,
                     time.time(),
                     req.connection.remote_ip])
    


    The data can be deserialized as follows:

    PYTHON
    public_server_nonce, connection_time, remote_ip = 
      pickle.loads(req.clarens["db_logins"][user_nonce])
    


    The user_nonce value is the authentication username, obtainable via

    PYTHON
    user_nonce=req.connection.user
    


  3. db_certs - key: user_nonce', value: the user certificate, in PEM format
  4. db_methods - key: method name, value a list of [signature,document string]
    The signature is the list discussed above5.2.3. Also use pickle.loads() to deserialize the data.

5.6.3 Printf-style debugging

To print messages in the log file, import the clarens_util module at the top of the new module's source file. Messages can then be logged with e.g.
clarens_util.err_msg("myvalue=%s\n"%myvalue)
The message should appear in the log file (e.g. $opkg_root/var/apache2/log/error_log).

Exceptions are usually handled by modules themselves which causes Python to log error messages to the log file. If a module fails to load in the first place, Clarens handles the resulting exception, and only reports that the module failed to load.

To log the exception in full in the logs, the exception must be raised in the file
system/__init__.py, at around line 998, after the message Failed to load the module %s was printed. Just add the statement
raise
to the exception handler.

In future a configuration switch will be provided to do this automatically.


next up previous contents
Next: 6. Service packaging and Up: manual Previous: 4. Writing Browser Interfaces   Contents
Conrad Steenberg 2005-07-11