Custom Database Tables: Creating the Table

In this series we'll be looking at using custom database tables. We'll cover how to create, maintain and remove the table, as well as how to safely, and efficiently, add, remove and query data. In this first article we look at when custom tables might be appropriate, the pros and cons of using them and how to create the table.

Fortunately, WordPress provides a fairly substantial API that makes creating and interacting with custom tables a bit simpler. Most notably: the $wpdb class and the dbDelta() function which we'll see more of during the series. Despite that, however, creating a custom table means creating something alien to WordPress - and you lose most of the framework that surrounds the native tables. For that reason, you as the plugin author are responsible for safely and efficiently interacting with it. So before jumping in you need to consider carefully whether or not it's more appropriate to using an existing core table.


The Disadvantages of Using a Custom Table

As mentioned, custom tables sit outside the normal WordPress framework - and for the most part this is the underlying cause of its disadvantages:

  • There are no native add, remove, update or query functions with which to interact with the table.
  • The UI needs to be built from (almost) scratch.
  • Sanitisation and caching is up to you (though WordPress provides a lot of help in this respect).
  • Other plugins, and WordPress itself, do not 'expect' your table to be there. On the other hand if your data is a custom post type - then most well-built third party plugins will work along-side it.
  • WordPress - nor many other related plugins - will backup or export your table. (Actually quite a few backup plugins support non-core tables, but exporting/importing is not so straightforward)
  • You're responsible for setting the structure of your custom table(s) in the most efficient way, including choosing the most appropriate data type for the columns.
  • You're responsible for writing bug free and efficient SQL queries.

When Is Creating a Custom Table Appropriate?

There is no 'right' answer to this, and sensible judgement of the pros and cons is required. However, the previous section outlines some serious drawbacks to not using the existing WordPress schema - as such if you're unsure, it's usually best to avoid creating a table. Furthermore a custom table approach requires a lot of work, and offers ample opportunity for bugs to crawl in. But with that in mind, when might a custom table be appropriate?

The Data Structure

One of the most important arguments for custom tables is when the data needs to be structured in such a way that is inappropriate for the native tables. The *_posts table is inherently geared towards posts and pages, which may be totally unsuitable for your data. In fact your data may best be spread across several tables, with relationships between them. It may not even be that complicated: the Posts 2 Posts plugin uses a custom table to store many-to-many relationships between post types. This could be done using the taxonomy API (and originally was) or the meta API - but neither of these are particularly efficient - and while may be fine for smaller sites, it does not scale well. Scribu moved Posts 2 Posts to a custom table implementation to allow information about a relationship to be stored.

While most cases can be 'squeezed' into the *_posts mould using post meta, this may not provide the most efficient route: The post meta table uses an un-indexed value column to store data. It is incredibly quick at retrieving a post's meta data (WordPress employs caching here too) but complex queries using the meta table can be inefficient or almost impossible.

Complex Queries

Related to the above is complex queries, which the native tables might not be designed to complete efficiently. In Event Organiser, for example, an event is a post with event dates stored in a separate table. Although it would be possible to store those dates as post meta - doing so when events have more than one date would make any date-based queries extremely difficult and inefficient - particularly so since the meta value column is not indexed.

Scale

If you use wp_posts and your data is sufficiently large (100,000+ posts) then it may hinder performance, depending on what queries you are running. This argument on its own is pretty weak really as there are a lot of unknowns that will effect its validity. In general though, databases are quick at what they do - and the surrounding WordPress framework serves to optimise queries as much as possible. In combination with the other two factors, however, you may find that a custom table presents the most sensible option.


Creating the Table

Once you've decided that a custom table is necessary, we need to create the table. Before we do that we'll store the name of our custom table in $wpdb. This global contains all the information pertaining to the database for the current blog (it will change from site to site, when using multi-site). We'll add our table name to this global. This is not at all necessary, but makes the rest of our code slightly neater:

The above code uses $wpdb->prefix to add a prefix to the table name. The prefix is by default wp_ but can be altered by the user in wp-config.php. This is necessary when you might have more than one WordPress install using the same database, but may also be changed for other reasons. As such you can not assume the prefix is wp_. As with functions, classes and settings etc, you should ensure your table name is unique.

Throughout this series we'll come back to the following example. We'll imagine that we are creating a table to log user activity (updating or removing posts, changing settings, uploading an image etc).

Column Naming Conventions

There are various conventions for how you name your columns (and your tables for that matter) - but regardless of how you name them, it's important to be consistent. I'd recommend using only lowercase characters as in some situations column names can be case sensitive, and imposing that rule makes mistakes less likely and improves readability. As we'll see later on in the series it's also useful for when you need to whitelist columns. You should separate words in column names (e.g. post_data, post_content) for readability - but you should do this with underscores, and never spaces.

You should also avoid reserved words. If the column refers to a foreign table then it's recommended you use that foreign column's name (such as user_id, our example).

In our example we'll be naming our columns:

  • log_id - the log ID.
  • user_id - the user ID for whom the log corresponds.
  • activity - the activity that occurred.
  • object_id - the ID of the object (e.g. post ID, user ID, comment ID etc) that was the subject of the user's activity.
  • object_type - the type of object (e.g. 'post', 'user', 'comment' etc).
  • activity_date - the datetime of the activity.

Deciding the Column Types

Before you go any further you'll need to decide the data types of the columns your table is going to have. Column types can be split into three categories: strings, numerics and datetimes. For each of these there are many variants. You can find a full reference here.

It's important to choose the appropriate data type for your table as this will affect the efficiency of your queries. Some data types allow you to set a limit (e.g. varchar(40) - which allows you to store up to 40 characters). The limit is optional, but is recommended as it can improve performance - so you'll need to decide for each column what is the maximum amount of characters the column will require. Note for numeric data types the length refers to the number of digits - not the maximum ( e.g. INT(10) allows non-negative integers of up to 10 digits - so up to 4,294,967,295).

When storing dates you should almost always use the DATETIME data type (stored as 2012-11-05 14:55:10) - and certainly not a human friendly representation of the date (e.g. 5th November 2012 2:55pm). DATETIME values can be easily formatted into human readable form using functions like mysql2date(). You should store dates in UTC timezone and, if required, switch it to a different timezone on output.

In our example we'll have:

  • log_id - bigint(20)
  • user_id - bigint(20)
  • activity - varchar(20)
  • object_id - bigint(20)
  • object_type - varchar(20)
  • date - datetime

Indexing Columns

Next you'll need to decide which columns to index - these will be declared as KEYs, one of which will be the PRIMARY KEY. The primary key is a column where each row has a unique entry - usually it's just an auto-incrementing integer, essentially the 'row number'.

The values of the other indexed columns need not be unique, but the value should determine a relatively small set of records. The idea of indexing is to improve read queries. Without an index a search would have to read through the entire table to find matching rows. If a column is indexed and part of the query - then it can quickly find rows that match that column and then that smaller subset of matching rows can checked against the query (The analogy is an index for a book).

As such if you do not query by that column then indexing that column will not help (if you never look up a word in the book's index, it may as well not be there). Nor if lots of records share the same value, such as a 'gender' column, as this won't offer much of an improvement on a full table scan (imagine a book index which listed a word that appears on every other page).

Indexing is not free either: Columns declared as KEYs do reduce write performance (to continue the analogy you would need to update the book index when an indexed word is added or removed) - and so you'll need to decide what the right balance is for your set up. More information can be found here.

Since it's likely we'll want to query by user (to see their recent activity) we'll be indexing this column, and using the log_id as the primary key.

Creating the Table

We'll place the code for creating the custom table inside the following function:

This function will need to be called on the plugin's activation hook, as well as any time we wish to make any alterations to the table - for instance, adding columns or changing their data type (we'll cover why later in the series).

The fact that by using the activation hook, wptuts_create_tables() could be called when a table already exists, is not an oversight - and again, we'll be covering why later on in the series.

Inside that function, we include wp-admin/includes/upgrade.php to set up a few constants and load the function dbDelta(). Note, that when a plugin is activated it misses the init hook, so wptuts_register_activity_log_table() must be called manually.

The global $charset_collate contains the character set and collation used by the native WordPress tables. Loosely, these define the encodings of characters and how they are compared - given that WordPress is used in many different languages it's important to use the correct collation for your table.

Apart from the collation, the SQL statement should declare the table name, along with each column, its type and default value and any KEY columns, including a PRIMARY KEY column. Typically it will be of the form:

To create this table we add the following to our wptuts_create_tables() function:

The dbDelta() function performs our CREATE TABLE command. It can be quite strict about the SQL statement given to it. For instance, there must two spaces between PRIMARY KEY and the primary key column. and keys must be given a name.

Debugging

If on activation you find you get the 'You have X character of unexpected output…' error message - it's likely that there is an error in your SQL statement. Sometimes it's due to dbDelta()'s strictness. If you add wp_die(); after dbDelta(), this kills the processing and (with `WP_DEBUG` set to true) will reveal any error messages.

Summary

In this article we've looked at reasons why you should and shouldn't use custom tables, as well the details you'll need to consider and finally how to create a table. The next part of this series will cover sanitisation, looking at SQL injection and how you to protect yourself from it. The code in this article is available at this GitHub repository, and will be updated as the series continues.


Resources

Tags:

Comments

Related Articles