SimpleDB is a remote database offered by Amazon Web Services (AWS). The world of data stores is usually divided into SQL and NoSQL, based on the use (or non-use) of the SQL language. NoSQL data stores are usually based on a simpler key/value setup. SimpleDB straddles this line—it is a key/value store and it can also use a variant of SQL for retrieval. Most SQL languages are based on a schema that lays out the rows and columns of the data, but SimpleDB is a schema-less database, making for a very flexible data store.
In the SimpleDB database model, you have items, attributes and values. Each row in the database is an item and can be identified by a unique and assignable item name. Each item can have up to 256 pairs of attributes and values. An unexpected aspect of SimpleDB is that an attribute can have more than one pair per item. I think the best way to think about SimpleDB is to think of a spreadsheet, but instead of each column/row intersection representing a single value, it represents an array of values.
This chart represents two items stored in a SimpleDB domain. The term domain is analogous to a “table” in other databases.
The first column is the item name—this is the only column where you can have only a single value, and you can think of it as a unique index column.
The other four columns (pets, cars, furniture, and phones) represent attributes that are currently in this domain—you aren’t limited to this, so every item can have an entirely unique set of attributes. In this data, the attribute pets on the item personInventory1 has three pairs; expressed in JSON, it’ll look something like this:
{ "Name" : "pets", "Value" : "dog" }, { "Name" : "pets", "Value" : "cat" }, { "Name" : "pets", "Value" : "fish" }
On the other hand, the item personInventory2 has only one pair:
{ "Name" : "pets", "Value" : "cat" }
While you don’t have to supply the same attributes for each item, you do need to supply at least one pair. This means that you cannot have an ‘empty’ item. Each attribute can have a value up to 1kb in size, so this means that each item is functionally limited to 256kb, due to the 1kb value limit and the 256 pair limit.
SimpleDB is distributed, which has some distinct traits that you need to understand and keep in mind as you design your app. Being a distributed database means a whole group of machines will respond to your requests and your data will be replicated throughout these servers. This distribution will be completely transparent to your program, but it does introduce the possibility of consistency issues—your data cannot be guaranteed to be present on all servers initially.
Don’t panic: it’s not as bad as it sounds for a few reasons. With SimpleDB, the consistency isn’t promised, but it is usually pretty good and quickly reaches all nodes from my experience. Designing around this also isn’t so hard—normally you try to avoid immediately reading a record you just wrote. Finally, SimpleDB has the option to perform consistent reads, but they are slower and may consume more resources. If your app requires consistent reading every time, you might want to reconsider using SimpleDB as your data store, but for many applications, this can be designed around or not even worried about.
On the upside, the distributed nature also affords SimpleDB a few advantages that mesh nicely with the Node.js environment. Since you don’t have a single server responding to your requests, you don’t need to worry about saturating the service, and you can achieve good performance by making many parallel requests to SimpleDB. Parallel and asynchronous requests are something that Node.js can handle easily.
Unlike many AWS services, there isn’t an Amazon-delivered console for management of SimpleDB. Luckily, there is a nice in-browser management console in the form of a Google Chrome plugin, SdbNavigator. In the SdbNavigator you can add or delete domains, insert, update and delete items, modify attributes, and perform queries.
AWS SDK
Now that we’ve gotten to know the SimpleDB service, let’s start writing our REST server. First, we’ll need to install the AWS SDK. This SDK handles not just SimpleDB but all the AWS services, so you may already be including it in your package.json file. To install the SDK, run the following from the command line:
npm install aws-sdk —-save
To use SimpleDB, you’ll also need to get your AWS credentials, which include an Access Key and a Secret Key. SimpleDB is a pay-as-you-go service, but AWS currently includes a generous free allowance for SimpleDB.
Word of warning: As with any pay-as-you-go service, be aware that it’s possible to write code that can rack up big bills, so you’re going to want to keep an eye on your usage and keep your credentials private and safe.
Once you get the AWS SDK installed and have acquired your credentials, you’ll need to set up SimpleDB in your code. In this example, we'll use AWS credentials stored in a JSON file in your home directory. First, you’ll need to include the SDK module, create an AWS object, and finally set up your SimpleDB interface.
var aws = require('aws-sdk'), simpledb; aws.config.loadFromPath(process.env['HOME'] + '/aws.credentials.json'); //We'll use the Northern Virginia datacenter, change the region / endpoint for other datacenters http://docs.aws.amazon.com/general/latest/gr/rande.html#sdb_region simpledb = new aws.SimpleDB({ region : 'US-East', endpoint : 'https://sdb.amazonaws.com' });
Notice that we are using a specific endpoint and region. Each datacenter is entirely independent, so if you create a Domain named “mysuperawesomedata” in Northern Virginia, it will not be replicated to nor present in the Sao Paulo datacenter, for example.
The SimpleDB object that you’ve created with new aws.SimpleDB
is where all your methods for interacting with SimpleDB will be based. The AWS SDK for SimpleDB has only a few methods:
Batch Operations
- batchDeleteAttributes
- batchPutAttributes
Domain Management & Information
- createDomain
- deleteDomain
- domainMetadata
- listDomains
Item/Attribute Manipulation
- deleteAttributes
- getAttributes
- putAttributes
Querying
- select
In this tutorial, we will only be dealing with Item/Attribute Manipulation and Querying; while the other categories are useful, many applications will not have any use for them.
Test Data
Using SdbNavigator, enter your access and security keys into the tool, select ‘US-East’, and click connect.
Once you’ve successfully connected, let’s create a domain for testing. Click Add domain.
Then enter the domain name ‘sdb-rest-tut’ and click OK.
Now that you’ve created a domain, let’s enter some test data. Click Add property and add a property named “colors”. As a convention, I usually name properties in plural form to reflect the multi-value nature of SimpleDB.
Finally, we’ll click Add record to create our first SimpleDB item. In the ItemName() column, enter your unique item name. A quirk of SdbNavigator is that, by default, it will only accept a single value to each item, but this obscures the fact that a property can contain multiple values. To enter multiple values, click the S along the right edge of the property column.
In the new box, select Array to enter multiple values. In the Value column, enter “red”, and then click Add value and enter “blue”.
Finally, click Update to save the changes to this row.
Now that we’ve entered some test data, let’s make our first SimpleDB request from Node. We’ll just get everything in the Domain, which, at this point, will be just a single row.
var aws = require('aws-sdk'), simpledb; aws.config.loadFromPath(process.env['HOME'] + '/aws.credentials.json'); simpledb = new aws.SimpleDB({ region : 'US-East', endpoint : 'https://sdb.amazonaws.com' }); simpledb.select({ SelectExpression : 'select * from `sdb-rest-tut` limit 100' }, function(err,resp) { if (err) { console.error(err); } else { console.log(JSON.stringify(resp,null,' ')); } });
The response will be logged to the console. Here is the response, annotated for explanation:
{ "ResponseMetadata": { "RequestId": "...", //Every request made to SimpleDB has a request ID "BoxUsage": "0.0000228616" //This is how your account is charged, as of time of writing US-East region is 14 US cents per hour, so this request costs 0.00032 cents + the transfer cost (if you are currently outside of your free tier) }, "Items": [ //For a Select, your response will be in the "Items" object property { "Name": "myfirstitem", //this is the itemName() "Attributes": [ //these are the attribute pairs { "Name": "colors", //attribute name "Value": "red" //value - note that every Value is a string, regardless of the contents }, { "Name": "colors", //Since the attribute name is repeated, we can see that `colors` has more than one value "Value": "blue" } ] } ] }
A REST Server
Since we’ll be building a REST Server that stores data in SimpleDB, it’s important to understand what a REST server does. REST stands for REpresentational State Transfer. A REST server is really just a server that uses HTTP standard mechanisms as an interface for your data. Often, REST is used for server-to-server communications, but you can use REST servers with the client through JavaScript libraries such as jQuery or Angular. Generally, however, an end-user won’t interact directly with a REST server.
Interestingly, the AWS SDK actually uses the REST protocol to interact with SimpleDB, so it may seem odd to create a REST server to another REST server. You wouldn’t want to use the SimpleDB REST API directly because you need to authenticate your requests, which would risk the security of your AWS account. Also, by writing a server, you’ll be able to add a layer of both abstraction and validation to your data storage that will make the rest of your whole application much easier to deal with.
In this tutorial we will be building the basic CRUD+L functions, that is Create, Read, Update, Delete and List. If you think about it, you can break down most applications into CRUD+L. With REST, you will use a limited number of paths and several HTTP methods or verbs to create an intuitive API. Most developers are familiar with a few of the HTTP verbs, namely GET and POST, as they are used most often in web applications, but there are several others.
Operation | HTTP Verb |
---|---|
Create | POST |
Read | GET |
Update | PUT |
Delete | DELETE |
List | GET |
Notice that Read and List both use the same verb; we will be using slightly different paths to differentiate between the two. We’re using POST to represent Create as creating is not considered idempotent. Idempotent means that multiple identical calls will have the same result to the user and in your data, so an update (aka PUT) would be considered idempotent.
As our example, we’ll build a personal inventory server—a database to save whatever you own. Here is how the paths will look:
Operation | HTTP Verb | Path |
---|---|---|
Create | POST | /inventory |
Read | GET | /inventory/1234 |
Update | PUT | /inventory/1234 |
Delete | DELETE | /inventory/1234 |
List | GET | /inventory |
1234 is a placeholder for the person identifier (ID)—note that ‘create' and ‘list' do not have an ID. In the case of create, the ID will be generated, and with list, we’ll be getting all the names, so we don’t need a specific ID.
Building the Server
To get started, let's install Express, a Node.js HTTP server framework:
npm install express —-save
Express manages most of the minutiae in setting up a server, but it doesn't include any facility for handling the HTTP request body, so we'll need to install another module, body-parser, to enable us to read the request body.
npm install body-parser --save
Body-parser has a few different options for parsing the body of the HTTP request. We’ll use the json()
method for readability, but switching to another method is just swapping out the method on the bodyParser
object. We only need the bodyParser
method on the create and update methods, so we can just include it in those particular routes.
Create
Since each SimpleDB itemName
needs to be unique, we can auto-generate a new itemName
for each newly created item. We’re going to use the cuid module, which is a lightweight way to generate unique identifiers.
npm install cuid --save
SimpleDB expects attributes to be in the attribute name/value pair format:
[ { "Name" : "attribute1", "Value" : "value1" }, { "Name" : "attribute1", "Value" : "value2" }, { "Name" : "attribute2", "Value" : "value3" }, { "Name" : "attribute3", "Value" : "value4" } ]
Your server could certainly just accept and pass the values in this format directly to SimpleDB, but it is counter-intuitive to how data is often structured, and it's a difficult concept with which to work. We'll use a more intuitive data structure, an object/value array:
{ "attribute1" : ["value1","value2"], "attribute2" : ["value3","value4"] }
Here is a basic Express-based server with the create operation:
var aws = require('aws-sdk'), bodyParser = require('body-parser'), cuid = require('cuid'), express = require('express'), sdbDomain = 'sdb-rest-tut', app = express(), simpledb; aws.config.loadFromPath(process.env['HOME'] + '/aws.credentials.json'); simpledb = new aws.SimpleDB({ region : 'US-East', endpoint : 'https://sdb.amazonaws.com' }); //create app.post( '/inventory', bodyParser.json(), function(req,res,next) { var sdbAttributes = [], newItemName = cuid(); //start with: /* { attributeN : ['value1','value2',..'valueN'] } */ Object.keys(req.body).forEach(function(anAttributeName) { req.body[anAttributeName].forEach(function(aValue) { sdbAttributes.push({ Name : anAttributeName, Value : aValue }); }); }); //end up with: /* [ { Name : 'attributeN', Value : 'value1' }, { Name : 'attributeN', Value : 'value2' }, ... { Name : 'attributeN', Value : 'valueN' }, ] */ simpledb.putAttributes({ DomainName : sdbDomain, ItemName : newItemName, Attributes : sdbAttributes }, function(err,awsResp) { if (err) { next(err); //server error to user } else { res.send({ itemName : newItemName }); } }); } ); app.listen(3000, function () { console.log('SimpleDB-powered REST server started.'); });
Let's start up your server and try it out. A great way to interact with a REST server is to use the cURL tool. This tool allows you to make an HTTP request with any verb right from the command line. To try out creating an item with our REST server, we'll need to activate a few extra options:
curl -H "Content-Type: application/json" -X POST -d '{"pets" : ["dog","cat"], "cars" : ["saab"]}' http://localhost:3000/inventory
Option | Purpose |
---|---|
-H | Add a line to the HTTP heading |
-X | Define which verb will be used |
-d | Data to be sent in the HTTP request body |
After running the command, you'll see a JSON response with your newly created itemName or ID. If you switch over to SdbNavigator, you should see the new data when you query all the items.
Read
Now let’s build a basic function to read an item from SimpleDB. For this, we don’t need to perform a query since we’ll be getting the itemName or ID from the path of the request. We can perform a getAttributes
request with that itemName or ID.
If we stopped here, we would have a functional but not very friendly form of our data. Let’s transform the Name/Value array into the same form we’re using to accept data (attribute : array of values). To accomplish this, we will need to go through each name/value pair and add it to a new array for each unique name.
Finally, let’s add the itemName and return the results.
//Read app.get('/inventory/:itemID', function(req,res,next) { simpledb.getAttributes({ DomainName : sdbDomain, ItemName : req.params.itemID //this gets the value from :itemID in the path }, function(err,awsResp) { var attributes = {}; if (err) { next(err); //server error to users } else { awsResp.Attributes.forEach(function(aPair) { // if this is the first time we are seeing the aPair.Name, let's add it to the response object, attributes as an array if (!attributes[aPair.Name]) { attributes[aPair.Name] = []; } //push the value into the correct array attributes[aPair.Name].push(aPair.Value); }); res.send({ itemName : req.params.itemID, inventory : attributes }); } }); });
To test this, we need to use curl again. Try replacing [cuid]
with the itemName or ID returned from our example of creating an item earlier in this tutorial.
curl -D- http://localhost:3000/inventory/[cuid]
Notice that we're using the -D-
option. This will dump the HTTP head so we can see the response code.
Another aspect of REST is to use your response codes meaningfully. In the current example, if you supply a non-existent ID to curl, the above server will crash because you’re trying to forEach
a non-existent array. We need to account for this and return a meaningful HTTP response code indicating that the item was not found.
To prevent the error, we should test for the existence of the variable awsResp.Attributes
. If it doesn’t exist, let’s set the status code to 404 and end the http request. If it exists, then we can serve the response with attributes.
app.get('/inventory/:itemID', function(req,res,next) { simpledb.getAttributes({ DomainName : sdbDomain, ItemName : req.params.itemID }, function(err,awsResp) { var attributes = {}; if (err) { next(err); } else { if (!awsResp.Attributes) { //set the status response to 404 because we didn't find any attributes then end it res.status(404).end(); } else { awsResp.Attributes.forEach(function(aPair) { if (!attributes[aPair.Name]) { attributes[aPair.Name] = []; } attributes[aPair.Name].push(aPair.Value); }); res.send({ itemName : req.params.itemID, inventory : attributes }); } } }); });
Try it out with the new code and a non-existent ID and you'll see that the server returns a 404.
Now that we know how to use status
to change the value, we should also update how we are responding to a POST/create. While the 200 response is technically correct as it means ‘OK’, a more insightful response code would be 201, which indicates ‘created’. To make this change, we’ll add it in the status method before sending.
res .status(201) .send({ itemName : newItemName });
Update
Update is usually the most difficult operation for any system, and this REST server is no exception.
The nature of SimpleDB makes this operation a little more challenging as well. In the case of a REST server, an update is where you are replacing the entire piece of stored data; SimpleDB on the other hand, represents individual attribute/value pairs under an itemName.
To allow for an update to represent a single piece of data rather than a collection of name/value pairs, we need to define a schema for the purposes of our code (even though SimpleDB doesn’t need one). Don’t worry if this is unclear right now—keep reading and I’ll illustrate the requirement.
Compared to many other database systems, our schema will be very simple: just a defined array of attributes. For our example, we have four fields we are concerned with: pets, cars, furniture, and phones:
schema = ['pets','cars','furniture','phones'],
With SimpleDB you can’t store an empty attribute/value pair, nor does SimpleDB have any concept of individual items, so we’ll assume that if SimpleDB doesn’t return a value, it doesn’t exist. Similarly, if we try to update a SimpleDB item with an empty attribute/value pair, it will ignore that data. Take, for example, this data:
{ "itemName": "cil89uvnm00011ma2fykmy79c", "inventory": { "cars": [], "pets": [ "cat", "dog" ] } }
Logically, we know that cars
, being an empty array, should have no values, and pets
should have two values, but what about phones
and furniture
? What do you do to those? Here is how we translate this update request to work with SimpleDB:
- Put an attribute
pet
with a value tocat
. - Put an attribute
pet
with a value todog
.
- Delete attributes for
cars
.
- Delete attributes for
phones
. - Delete attributes for
furniture
.
Without some form of schema that at least defines the attributes, we wouldn’t know that phones
and furniture
needed to be deleted. Luckily, we can consolidate this update operation into two SimpleDB requests instead of five: one to put the attributes, and one to delete the attributes. This is a good time to pull out the code from the post/create function that transforms the attribute/array of values object into the attribute/value pair array.
function attributeObjectToAttributeValuePairs(attrObj, replace) { var sdbAttributes = []; Object.keys(attrObj).forEach(function(anAttributeName) { attrObj[anAttributeName].forEach(function(aValue) { sdbAttributes.push({ Name : anAttributeName, Value : aValue, Replace : replace //if true, then SimpleDB will overwrite rather than append more values to an attribute }); }); }); return sdbAttributes; }
We’re going to make an important alteration to the create function as well. We’ll be adding a new attribute/value to all items. This attribute will not be added to the schema and is effectively read-only.
We will add an attribute called created
and set the value to 1
. With SimpleDB, there is limited ability to check if an item exists prior to adding attributes and values. On every putAttributes
request you can check for the value and existence of a single attribute—in our case, we’ll use created
and check for a value of 1. While this may seem like a strange workaround, it provides a very important safety to prevent the update operation from being able to create new items with an arbitrary ID.
newAttributes.push({ Name : 'created', Value : '1' });
Since we’ll be doing a couple of asynchronous HTTP requests, let’s install the async module to ease the handling of those callbacks.
npm install async —-save
Remember, since SimpleDB is distributed, there is no reason to sequentially put our attributes and then delete. We’ll use the function async.parallel
to run these two operations and get a callback when both have completed. The responses from AWS form putAttributes
and deleteAttributes
do not provide important information, so we will just send an empty response with a status code 200 if there are no errors.
app.put( '/inventory/:itemID', bodyParser.json(), function(req,res,next) { var updateValues = {}, deleteValues = []; schema.forEach(function(anAttribute) { if ((!req.body[anAttribute]) || (req.body[anAttribute].length === 0)) { deleteValues.push({ Name : anAttribute}); } else { updateValues[anAttribute] = req.body[anAttribute]; } }); async.parallel([ function(cb) { //update anything that is present simpledb.putAttributes({ DomainName : sdbDomain, ItemName : req.params.itemID, Attributes : attributeObjectToAttributeValuePairs(updateValues,true), Expected : { Name : 'created', Value : '1', Exists : true } }, cb ); }, function(cb) { //delete any attributes that not present simpledb.deleteAttributes({ DomainName : sdbDomain, ItemName : req.params.itemID, Attributes : deleteValues }, cb ); } ], function(err) { if (err) { next(err); } else { res.status(200).end(); } } ); } );
To take this for a spin, let's update a previously created entry. This time, we will make the inventory only include a "dog", removing all other items. Again, with cURL, run the command, substituting [cuid] with one of your item IDs.
curl -H "Content-Type: application/json" -X PUT -d '{"pets" : ["dog"] }' http://localhost:3000/inventory/[cuid]
Delete
SimpleDB has no concept of an item deletion, but it can delete attributes, as mentioned above. To delete an item, we’ll need to delete all the attributes and the ‘item' will cease to be.
Since we’ve defined a list of attributes in our schema, we’ll use the deleteAttributes
call to remove all of those attributes as well as the created
attribute. As per our plan, this operation will be at the same path as Update, but using the verb delete.
app.delete( '/inventory/:itemID', function(req,res,next) { var attributesToDelete; attributesToDelete = schema.map(function(anAttribute){ return { Name : anAttribute }; }); attributesToDelete.push({ Name : 'created' }); simpledb.deleteAttributes({ DomainName : sdbDomain, ItemName : req.params.itemID, Attributes : attributesToDelete }, function(err) { if (err) { next(err); } else { res.status(200).end(); } } ); } );
List
Rounding out our REST verbs is list. To achieve the list operation, we’re going to use the select command and the SQL-like query language. Our list function will be barebones, but will serve as a good basis for more complex retrieval later on. We’re going to make a very simple query:
select * from `sdb-rest-tut` limit 100
As we ran into with the get/read operation, the response from SimpleDB isn’t very useful as it is focused on the attribute/value pairs. To avoid repeating ourselves, we’ll refactor the part of the get/read operation into a separate function and use it here. While we are at it, we’ll also filter out the created
attribute (as it will show up in the get operation).
function attributeValuePairsToAttributeObject(pairs) { var attributes = {}; pairs .filter(function(aPair) { return aPair.Name !== 'created'; }) .forEach(function(aPair) { if (!attributes[aPair.Name]) { attributes[aPair.Name] = []; } attributes[aPair.Name].push(aPair.Value); }); return attributes; }
With a select operation, SimpleDB returns the values in the Items
array. Each item is represented by an object that contains the itemName (as simply Name
) and the attribute/value pairs.
To simplify this response, let’s return everything in a single object. First, we’ll convert the attribute/value pairs into an attribute/value array as we did in the read/get operation, and then we can add the itemName as the property ID.
app.get( '/inventory', function(req,res,next) { simpledb.select({ SelectExpression : 'select * from `sdb-rest-tut` limit 100' }, function(err,awsResp) { var items = []; if (err) { next(err); } else { items = awsResp.Items.map(function(anAwsItem) { var anItem; anItem = attributeValuePairsToAttributeObject(anAwsItem.Attributes); anItem.id = anAwsItem.Name; return anItem; }); res.send(items); } }); } );
To see our results, we can use curl:
curl -D- -X GET http://localhost:3000/inventory
Validation
Validation is whole a subject of its own, but with the code we’ve already written, we have a start for a simple validation system.
For now, all we want to make sure is that a user can’t submit anything but what is in the schema. Looking back at the code that was written for update/put, forEach
ing over the schema will prevent any unauthorized attributes from being added, so we really just need to apply something similar to our create/post operation. In this case, we will filter the attribute/value pairs, eliminating any non-schema attributes.
newAttributes = newAttributes.filter(function(anAttribute) { return schema.indexOf(anAttribute.Name) !== -1; });
In your production code, you will likely want a more robust validation system. I would suggest integrating a JSON schema validator like ajv and building a middleware that sits between bodyParser
and your route function on create and update operations.
Next Steps
With the code outlined in this article, you have all the operations needed to store, read and modify data, but this is only the start of your journey. In most cases, you’ll need to start thinking about the following topics:
- Authentication
- Pagination
- Complex list/query operations
- Additional output formats (xml, csv, etc.)
This basis for a REST server powered by SimpleDB allows you to add middleware and additional logic to build a backbone for your application.
The finalized server code is available at simpledb-rest-api on GitHub.
Comments