Custom Database Tables: Creating an API

In the first part of this series we looked at the disadvantages of using a custom table. One of the major ones is the lack of an API: so in this article we'll look at how to create one. The API acts a layer between handling data in your plug-in and the actual interaction with the database table – and is primarily intended to ensure such interactions are safe and to provide a ‘human friendly’ wrapper for your table. As such we’ll require wrapper functions for inserting, updating, deleting and querying data.

Why Should I Create an API?

There are several reasons why an API is recommended – but most boil down to two related principles: reducing code duplication and separation of concerns.

It’s Safer

With the above four mentioned wrapper functions you only need to ensure your database queries are safe in four places – you can then forget about sanitisation completely. Once your confident that your wrapper functions handle the database safely, then you do not need to worry about the data you’re giving them. You can also validate the data – returning an error if something isn’t right.

The idea is that without these function you’ll need to make sure each instance of interacting with your database does so safely. This just brings an increased likelihood that in one of these instances you’ll miss something and create a vulnerability in your plug-in.

Reduces Bugs

This is related to the first point (and both are related to code duplication). By duplicating code there is greater scope for bugs to crawl in. Conversely, by using a wrapper functions – if there is a bug with updating or querying the database table – you know exactly where to look.

It’s Easier to Read

This may seem like a ‘soft’ reason – but readability of code is incredibly important. Readability is about making the logic and actions of the code clear to the reader. This isn’t just important when working as a part of a team, or when someone might inherit your work: you may know what your code is meant to do now, but in six months you will probably have forgotten. And if your code is hard to follow, its easier to introduce a bug.

Wrapper functions clean up your code by literally separating the internal workings of some operation (say creating a post) from the context of that operation (say handling a form submission). Just imagine having the entire contents of wp_insert_post() in place of every instance you use wp_insert_post().

Adds a Layer of Abstraction

Adding layers of abstraction is not always a good thing – but here it undoubtedly is. Not only do these wrappers provide a human friendly way of updating or querying the table (imagine having to use SQL to query posts rather than using the far more concise WP_Query() – and all the SQL formulation and sanitisation that goes with it), but also helps protects you and other developers from changes to the underlying database structure.

By using wrapper functions you, as well as third parties, can use these without fear that they are unsafe or will break. If you decide to rename a column, move a column elsewhere or even delete it you can be sure that the rest of your plug-in won’t break, because you just make the necessary changes to your wrapper functions. (Incidentally this is a compelling reason to avoid direct SQL queries of WordPress tables: if they change, and they will, it will break your plug-in.). On the other hand an API help your plug-in to be extended in a stable way.

Consistency

I'm perhaps guilty of splitting one point into two here - but I feel that this an important benefit. There is little worse than inconsistency when developing plug-ins: it just encourages messy code. Wrapper functions provide a consistent interaction with the database: you provide data and it returns true (or an ID) or false (or an WP_Error object, if you prefer).

The API

Hopefully I've now convinced of the need for an API for your table. But before we go any further we’ll first define a helper function that will make sanitisation a bit easy.

The Table Columns

We’ll define a function which returns the columns of table along with the data format they expect. By doing this we can easily whitelist allowed columns and format the input accordingly. Furthermore if we make any changes to the columns, we just need to make the changes here

function wptuts_get_log_table_columns(){
    return array(
        'log_id'=> '%d',
        'user_id'=> '%d',
        'activity'=>'%s',
        'object_id'=>'%d',
        'object_type'=>'%s',
        'activity_date'=>'%s',
    );
}

Inserting Data

The most basic ‘insert’ wrapper function will just take an array of column-value pairs and insert these to the database. This need not be the case: you may decide to provide more ‘human friendly’ keys which you then map to the column names. You may also decide that some values are auto-generated or over-ridden based on the passed values (for example: post status in wp_insert_post()).

It maybe the *values* that need mapping. The format that data is best stored in is not always the most convenient format to use. For example, for dates it maybe easier to handle a DateTime object or a timestamp - and then convert this to the desired date format.

The wrapper function may be simple or complicated – but the minimum it should do is sanitise the input. I’d also recommend whitelisting for the recognised columns, as trying to insert data into a column that doesn’t exist can throw an error.

In this example the user ID is by default that of the current user, and all fields are given by their column name - which the exception of the activity date which is passed as 'date'. The date, in this example, should be a local timestamp, which is converted prior to adding it to the database.

/**
 * Inserts a log into the database
 *
 *@param $data array An array of key => value pairs to be inserted
 *@return int The log ID of the created activity log. Or WP_Error or false on failure.
*/
function wptuts_insert_log( $data=array() ){
    global $wpdb;        

    //Set default values
    $data = wp_parse_args($data, array(
                 'user_id'=> get_current_user_id(),
                 'date'=> current_time('timestamp'),
    ));    

    //Check date validity
    if( !is_float($data['date']) || $data['date'] <= 0 )
        return 0;

    //Convert activity date from local timestamp to GMT mysql format
    $data['activity_date'] = date_i18n( 'Y-m-d H:i:s', $data['date'], true );

    //Initialise column format array
    $column_formats = wptuts_get_log_table_columns();

    //Force fields to lower case
    $data = array_change_key_case ( $data );

    //White list columns
    $data = array_intersect_key($data, $column_formats);

    //Reorder $column_formats to match the order of columns given in $data
    $data_keys = array_keys($data);
    $column_formats = array_merge(array_flip($data_keys), $column_formats);

    $wpdb->insert($wpdb->wptuts_activity_log, $data, $column_formats);

    return $wpdb->insert_id;
}

Tip: It's also a good idea to check the validity of the data. What checks you should perform, and how the API reacts, depends entirely on your context. wp_insert_post(), for instance requires a certain degree of uniqueness to post slugs - if there's clashes, it auto-generates a unique one. wp_insert_term on the other hand returns an error if the term already exists. This is down to a mix between how WordPress handles these objects and semantics.

Updating Data

Updating data usually closely mimics inserting data – with the exception that a row identifier (usually just the primary key) is provided along with data that needs to be updated. In general the arguments should match the insert function (for consistency) - so in this example, 'date' is used instead of 'activity_date'

/**
 * Updates an activity log with supplied data
 *
 *@param $log_id int ID of the activity log to be updated
 *@param $data array An array of column=>value pairs to be updated
 *@return bool Whether the log was successfully updated.
*/
function wptuts_update_log( $log_id, $data=array() ){
    global $wpdb;        

    //Log ID must be positive integer
    $log_id = absint($log_id);     
    if( empty($log_id) )
         return false;

    //Convert activity date from local timestamp to GMT mysql format
    if( isset($data['activity_date']) )
         $data['activity_date'] = date_i18n( 'Y-m-d H:i:s', $data['date'], true );


    //Initialise column format array
    $column_formats = wptuts_get_log_table_columns();

    //Force fields to lower case
    $data = array_change_key_case ( $data );

    //White list columns
    $data = array_intersect_key($data, $column_formats);

    //Reorder $column_formats to match the order of columns given in $data
    $data_keys = array_keys($data);
    $column_formats = array_merge(array_flip($data_keys), $column_formats);

    if ( false === $wpdb->update($wpdb->wptuts_activity_log, $data, array('log_id'=>$log_id), $column_formats) ) {
         return false;
    }

    return true;
}

Querying Data

A wrapper function for querying data will often be fairly complicated – particularly since you may want to support all types of queries which select only certain fields, restrict by AND or OR statements, order by one of several possible columns etc (just see the WP_Query class).

The basic principal of wrapper function for querying the data is that should it should take a ‘query array’, interpret it, and form the corresponding SQL statement.

/**
 * Retrieves activity logs from the database matching $query.
 * $query is an array which can contain the following keys:
 *
 * 'fields' - an array of columns to include in returned roles. Or 'count' to count rows. Default: empty (all fields).
 * 'orderby' - datetime, user_id or log_id. Default: datetime.
 * 'order' - asc or desc
 * 'user_id' - user ID to match, or an array of user IDs
 * 'since' - timestamp. Return only activities after this date. Default false, no restriction.
 * 'until' - timestamp. Return only activities up to this date. Default false, no restriction.
 *
 *@param $query Query array
 *@return array Array of matching logs. False on error.
*/
function wptuts_get_logs( $query=array() ){

     global $wpdb;
     /* Parse defaults */
     $defaults = array(
       'fields'=>array(),'orderby'=>'datetime','order'=>'desc', 'user_id'=>false,
       'since'=>false,'until'=>false,'number'=>10,'offset'=>0
     );

    $query = wp_parse_args($query, $defaults);

    /* Form a cache key from the query */
    $cache_key = 'wptuts_logs:'.md5( serialize($query));
    $cache = wp_cache_get( $cache_key );

    if ( false !== $cache ) {
            $cache = apply_filters('wptuts_get_logs', $cache, $query);
            return $cache;
    }

     extract($query);

    /* SQL Select */
    //Whitelist of allowed fields
    $allowed_fields = wptuts_get_log_table_columns();
	
    if( is_array($fields) ){
	    //Convert fields to lowercase (as our column names are all lower case - see part 1)
	    $fields = array_map('strtolower',$fields);

    	    //Sanitize by white listing
    	   $fields = array_intersect($fields, $allowed_fields);
	}else{
		$fields = strtolower($fields);
	}

    //Return only selected fields. Empty is interpreted as all
    if( empty($fields) ){
        $select_sql = "SELECT* FROM {$wpdb->wptuts_activity_log}";
    }elseif( 'count' == $fields ) {
        $select_sql = "SELECT COUNT(*) FROM {$wpdb->wptuts_activity_log}";
    }else{
        $select_sql = "SELECT ".implode(',',$fields)." FROM {$wpdb->wptuts_activity_log}";
    }

     /*SQL Join */
     //We don't need this, but we'll allow it be filtered (see 'wptuts_logs_clauses' )
     $join_sql='';

    /* SQL Where */
    //Initialise WHERE
    $where_sql = 'WHERE 1=1';

    if( !empty($log_id) )
       $where_sql .=  $wpdb->prepare(' AND log_id=%d', $log_id);

    if( !empty($user_id) ){

       //Force $user_id to be an array
       if( !is_array( $user_id) )
           $user_id = array($user_id);

       $user_id = array_map('absint',$user_id); //Cast as positive integers
       $user_id__in = implode(',',$user_id);
       $where_sql .=  " AND user_id IN($user_id__in)";
    }

    $since = absint($since);
    $until = absint($until);

    if( !empty($since) )
       $where_sql .=  $wpdb->prepare(' AND activity_date >= %s', date_i18n( 'Y-m-d H:i:s', $since, true));

    if( !empty($until) )
       $where_sql .=  $wpdb->prepare(' AND activity_date <= %s', date_i18n( 'Y-m-d H:i:s', $until, true));

    /* SQL Order */
    //Whitelist order
    $order = strtoupper($order);
    $order = ( 'ASC' == $order ? 'ASC' : 'DESC' );

    switch( $orderby ){
       case 'log_id':
            $order_sql = "ORDER BY log_id $order";
       break;
       case 'user_id':
            $order_sql = "ORDER BY user_id $order";
       break;
       case 'datetime':
             $order_sql = "ORDER BY activity_date $order";
       default:
       break;
    }

    /* SQL Limit */
    $offset = absint($offset); //Positive integer
    if( $number == -1 ){
         $limit_sql = "";
    }else{
         $number = absint($number); //Positive integer
         $limit_sql = "LIMIT $offset, $number";
    }

    /* Filter SQL */
    $pieces = array( 'select_sql', 'join_sql', 'where_sql', 'order_sql', 'limit_sql' );
    $clauses = apply_filters( 'wptuts_logs_clauses', compact( $pieces ), $query );
    foreach ( $pieces as $piece )
          $$piece = isset( $clauses[ $piece ] ) ? $clauses[ $piece ] : '';

    /* Form SQL statement */
    $sql = "$select_sql $where_sql $order_sql $limit_sql";

    if( 'count' == $fields ){
        return $wpdb->get_var($sql);
    }

    /* Perform query */
    $logs = $wpdb->get_results($sql);

    /* Add to cache and filter */
    wp_cache_add( $cache_key, $logs, 24*60*60 );
    $logs = apply_filters('wptuts_get_logs', $logs, $query);
    return $logs;
 }

There is a fair bit going in the above example as I’ve tried to include several features that might be considered when developing your wrapper functions, which we cover in the subsequent sections.

Cache

You may consider your queries to be sufficiently complex, or repeated regularly, that it makes sense to cache the results. Since different queries will return different results, we obviously don’t want to use a generic cache key – we need one which is unique to that query. This is exactly what the following does. It serialises the query array, and then hashes it, producing a key unique to $query:

 $cache_key = 'wptuts_logs:'.md5( serialize($query));

Next we check if we have anything stored for that cache key – if so, great, we just return its contents. If not, we generate the SQL, perform the query and then add the results to the cache (for at most 24 hours) and return them. We'll need to remember that records could take up to 24 hours to appear in the results from this function. Usually there are contexts where the cache is automatically cleared - but we would need to implement these.

Filters & Actions

Hooks have been covered extensively on WPTuts+ recently by Tom McFarlin and Pippin Williamson. In his article, Pippin talks about the reasons why you should make your code extensible through hooks, and wrappers such as wptuts_get_logs() serve as excellent examples of where they can be used.

We’ve used two filters in the above function:

wptuts_get_logs – filters the result of the function
wptuts_logs_clauses – filters an array of SQL components

This allows third-party developers, or even ourselves, to build on the provided API. If we avoid using direct SQL in our plug-in and only use these wrapper functions we've built, then it immediately makes it possible to extend our plug-in. The wptuts_logs_clauses filter in particular would allow developers to alter each part of the SQL – and thus perform complex queries. We’ll note that its the job of any plug-in using these filters to make sure what they return is properly sanitised.

Hooks are just as useful when performing the other three main ‘operations’: inserting, updating and deleting data. Actions allow plug-ins to know when these are being performed – so they some action. In our context this might mean sending an email to an admin when a particular user performs a particular action. Filters, in the context of these operations, are useful for altering data before its inserted.

Be careful when naming hooks. A good hook name does several things:

Communicates when the hook is called or what its doing (e.g. you can guess what pre_get_posts and user_has_cap might do.
Be unique. Its recommended you prefix hooks with your plug-in’s name. Unlike functions, there won’t be an error if there is a clash between hook names – instead it will probably just ‘silently’ break one or more plug-ins.
Exhibits some sort of structure. Make your hooks predicable, and avoid naming hooks ‘on the fly’, as this can sometimes lead to seemingly random hook names. Instead plan as far ahead as possible the hooks you will use, and come up with an appropriate naming convention – and stick to it.

Tip: Generally its a good idea to mimic the same conventions as WordPress – as developers will more quickly understand what that hook is doing. Regarding using the plug-in’s name as a prefix: if your plug-in name is generic then this may not be sufficient to ensure uniqueness. Finally, do not give an action and a filter the same name.

Deleting Data

Deleting data is often the simplest of the wrappers – though it maybe required to perform some ‘clean up’ operations as well as simply removing the data. wp_delete_post() for instance, not only deletes the post from the *_posts table but also deletes the appropriate post meta, taxonomy relationships, comments and revisions etc.

In keeping with the comments of the previous section, we'll include two two actions: one triggered before and the other after a log has been deleted from the table. Following the WordPress’ naming convention for such actions:

_delete_ is triggered before deletion
_deleted_ is triggered after deletion

/**
 * Deletes an activity log from the database
 *
 *@param $log_id int ID of the activity log to be deleted
 *@return bool Whether the log was successfully deleted.
*/
function wptuts_delete_log( $log_id ){
    global $wpdb;        

    //Log ID must be positive integer
    $log_id = absint($log_id);     
    if( empty($log_id) )
         return false;

    do_action('wptuts_delete_log',$log_id);
    $sql = $wpdb->prepare("DELETE from {$wpdb->wptuts_activity_log} WHERE log_id = %d", $log_id);

    if( !$wpdb->query( $sql ) )
         return false;

    do_action('wptuts_deleted_log',$log_id);

    return true;
}

Documentation

I've been a bit lazy with the in-source documentation of the above API. In this series Tom McFarlin explains why you shouldn't be. You may have spent a lot of time developing your API functions, but if other developers don't know how to use them - they won't. You'll also be helping yourself, when after 6 months you've forgotten how the data should be given, or what you should expect to be returned.

Summary

Wrappers for your database table can range from the relatively simple (e.g. get_terms()) to extremely complex (e.g. the WP_Query class). Collectively they should seek to serve as the gateway to your table: allowing you to focus on the context in which they are used, and essentially forget what they are actually doing. The API you create is just a small example of the notion of 'separation of concerns', often attributed to Edsger W. Dijkstra in his paper On the role of scientific thought:

It is what I sometimes have called "the separation of concerns", which, even if not perfectly possible, is yet the only available technique for effective ordering of one's thoughts, that I know of. This is what I mean by "focusing one's attention upon some aspect": it does not mean ignoring the other aspects, it is just doing justice to the fact that from this aspect's point of view, the other is irrelevant. It is being one- and multiple-track minded simultaneously.

You can find the code used in this series, in its entirety, on GitHub. In the next part of this series we'll be looking at how you can maintain your database, and handle upgrades.

HIGHLIGHTS OF THE DAY