Mapping Relational Databases and SQL to MongoDB

NoSQL databases have emerged tremendously in the last few years owing to their less constrained structure, scalable schema design, and faster access compared to traditional relational databases (RDBMS/SQL). MongoDB is an open source document-oriented NoSQL database which stores data in the form of JSON-like objects. It has emerged as one of the leading databases due to its dynamic schema, high scalability, optimal query performance, faster indexing and an active user community.

If you are coming from an RDBMS/SQL background, understanding NoSQL and MongoDB concepts can be bit difficult while starting because both the technologies have very different manner of data representation. This article will drive you to understand how the RDBMS/SQL domain, its functionalities, terms and query language map to MongoDB database. By mapping, I mean that if we have a concept in RDBMS/SQL, we will see what its equivalent concept in MongoDB is.

We will start with mapping the basic relational concepts like table, row, column, etc and move to discuss indexing and joins. We will then look over the SQL queries and discuss their corresponding MongoDB database queries. The article assumes that you are aware of the basic relational database concepts and SQL, because throughout the article more stress will be laid on understanding how these concepts map in MongoDB. Let's begin.

Mapping Tables, Rows and Columns

Each database in MongoDB consists of collections which are equivalent to an RDBMS database consisting of SQL tables. Each collection stores data in the form of documents which is equivalent to tables storing data in rows. While a row stores data in its set of columns, a document has a JSON-like structure (known as BSON in MongoDB). Lastly, the way we have rows in an SQL row, we have fields in MongoDB. Following is an example of a document (read row) having some fields (read columns) storing user data:

{
"_id": ObjectId("5146bb52d8524270060001f3"),
"age": 25,
"city": "Los Angeles",
"email": "[email protected]",
"user_name": "Mark Hanks"
}

This document is equivalent to a single row in RDBMS. A collection consists of many such documents just as a table consists of many rows. Note that each document in a collection has a unique _id field, which is a 12-byte field that serves as a primary key for the documents. The field is auto generated on creation of the document and is used for uniquely identifying each document.

To understand the mappings better, let us take an example of an SQL table users and its corresponding structure in MongoDB. As shown in Fig 1, each row in the SQL table transforms to a document and each column to a field in MongoDB.

Figure 1 Mapping Table to Collection (1) — Figure 1

Dynamic Schema

One interesting thing to focus here is that different documents within a collection can have different schemas. So, it is possible in MongoDB for one document to have five fields and the other document to have seven fields. The fields can be easily added, removed and modified anytime. Also, there is no constraint on data types of the fields. Thus, at one instance a field can hold int type data and at the next instance it may hold an array.

These concepts must seem very different to the readers coming from RDBMS background where the table structures, their columns, data types and relations are pre-defined. This functionality to use dynamic schema allows us to generate dynamic documents at run time.

For instance, consider the following two documents inside the same collection but having different schemas (Fig 2):

Figure 2 Documents in a Collection having different structure — Figure 2

The first document contains the fields address and dob which are not present in the second document while the second document contains fields gender and occupation which are not present in the first one. Imagine if we would have designed this thing in SQL, we would have kept four extra columns for address, dob, gender and occupation, some of which would store empty (or null) values, and hence occupying unnecessary space.

This model of dynamic schema is the reason why NosSQL databases are highly scalable in terms of design. Various complex schemas (hierarchical, tree-structured, etc) which would require number of RDBMS tables can be designed efficiently using such documents. A typical example would be to store user posts, their likes, comments and other associated information in the form of documents. An SQL implementation for the same would ideally have separate tables for storing posts, comments and likes while a MongoDB document can store all these information in a single document.

Mapping Joins and Relationships

Relationships in RDBMS are achieved using primary and foreign key relationships and querying those using joins. There is no such straightforward mapping in MongoDB but the relationships here are designed using embedded and linking documents.

Consider an example wherein we need to store user information and corresponding contact information. An ideal SQL design would have two tables, say user_information and contact_information, with primary keys id and contact_id as shown in Fig 3. The contact_information table would also contain a column user_id which would be the foreign key linking to the id field of the user_information table.

Now we will see how we would design such relationships in MongoDB using approaches of Linking documents and Embedded documents. Observe that in the SQL schema, we generally add a column (like id and contact_id in our case) which acts as a primary column for that table. However, in MongoDB, we generally use the auto generated _id field as the primary key to uniquely identify the documents.

Linking Documents

This approach will use two collections, user_information and contact_information both having their unique _id fields. We will have a field user_id in the contact_information document which relates to the _id field of the user_information document showing which user the contact corresponds to. (See Fig 4) Note that in MongoDB, the relations and their corresponding operations have to be taken care manually (for example, through code) as no foreign key constraints and rules apply.

The user_id field in our document is simply a field that holds some data and all the logic associated with it has to be implemented by us. For example, even if you will insert some user_id in the contact_information document that does not exist in the user_information collection, MongoDB is not going to throw any error saying that corresponding user_id was not found in the user_information collection(unlike SQL where this would be an invalid foreign key constraint).

Embedding Documents

The second approach is to embed the contact_information document inside the user_information document like this (Fig 5):

In the above example, we have embedded a small document of contact information inside the user information. In the similar manner, large complex documents and hierarchical data can be embedded like this to relate entities.

Also, which approach to use among Linking and Embedded approach depends on the specific scenario. If the data to be embedded is expected to grow larger in size, it is better to use Linking approach rather than Embedded approach to avoid the document becoming too large. Embedded approach is generally used in cases where a limited amount of information (like address in our example) has to be embedded.

Mapping Chart

To summarize, the following chart (Fig 6) represents the common co-relations we have discussed:

Mapping SQL to MongoDB Queries

Now that we are comfortable with the basic mappings between RDBMS and MongoDB, we will discuss how the query language used to interact with the database differs between them.

For MongoDB queries, let us assume a collection users with document structure as follows:

{
"_id": ObjectId("5146bb52d8524270060001f3"),
"post_text":"This is a sample post" ,
"user_name": "mark",
"post_privacy": "public",
"post_likes_count": 0
}

For SQL queries, we assume the table users having five columns with the following structure:

We will discuss queries related to create and alter collections (or tables), inserting, reading, updating and removing documents (or rows). There are two queries for each point, one for SQL and another for MongoDB. I will be explaining the MongoDB queries only as we are quite familiar with the SQL queries. The MongoDB queries presented here are written in the Mongo JavaScript shell while the SQL queries are written in MySQL.

Create

In MongoDB, there is no need to explicitly create the collection structure (as we do for tables using a CREATE TABLE query). The structure of the document is automatically created when the first insert occurs in the collection. However, you can create an empty collection using createCollection command.

SQL: CREATE TABLE `posts` (`id` int(11) NOT NULL AUTO_INCREMENT,`post_text` varchar(500) NOT NULL,`user_name` varchar(20) NOT NULL,`post_privacy` varchar(10) NOT NULL,`post_likes_count` int(11) NOT NULL,PRIMARY KEY (`id`))

MongoDB: db.createCollection("posts")

Insert

To insert a document in MongoDB, we use the insert method which takes an object with key value pairs as its input. The inserted document will contain the autogenerated _id field. However, you can also explicitly provide a 12 byte value as _id along with the other fields.

SQL: INSERT INTO `posts` (`id` ,`post_text` ,`user_name` ,`post_privacy` ,`post_likes_count`)VALUES (NULL ,  'This is a sample post',  'mark',  'public',  '0');

MongoDB:  db.posts.insert({user_name:"mark", post_text:"This is a sample post", post_privacy:"public", post_likes_count:0})

There is no Alter Table function in MongoDB to change the document structure. As the documents are dynamic in schema, the schema changes as and when any update happens on the document.

Read

MongoDB uses the find method which is equivalent to the SELECT command in SQL. The following statements simply read all the documents from the posts collection.

SQL: SELECT * FROM  `posts`

MongoDB: db.posts.find()

The following query does a conditional search for documents having user_name field as mark. All the criteria for fetching the documents have to be placed in the first braces {} separated by commas.

SQL: SELECT * FROM `posts` WHERE `user_name` =  'mark'

MongoDB: db.posts.find({user_name:"mark"})

The following query fetches specific columns, post_text and post_likes_count as specified in the second set of braces {}.

SQL: SELECT  `post_text` ,  `post_likes_count` FROM  `posts`

MongoDB: db.posts.find({},{post_text:1,post_likes_count:1})

Note that MongoDB by default returns the _id field with each find statement. If we do not want this field in our result set, we have to specify the _id key with a 0 value in the list of columns to be retrieved. The 0 value of the key indicates that we want to exclude this field from the result set.

MongoDB: db.posts.find({},{post_text:1,post_likes_count:1,_id:0})

The following query fetches specific fields based on the criteria that user_name is mark.

SQL: SELECT  `post_text` , `post_likes_count` FROM `posts` WHERE `user_name` =  'mark'

MongoDB: db.posts.find({user_name:"mark"},{post_text:1,post_likes_count:1})

We will now add one more criteria to fetch the posts with privacy type as public. The criteria fields specified using commas represent the logical AND condition. Thus, this statement will look for documents having both user_name as mark and post_privacy as public.

SQL: SELECT  `post_text` ,  `post_likes_count` FROM  `posts` WHERE  `user_name` =  'mark' AND  `post_privacy` =  'public'

MongoDB: db.posts.find({user_name:"mark",post_privacy:"public"},{post_text:1,post_likes_count:1})

To use logical OR between the criteria in the find method, we use the $or operator.

SQL: SELECT  `post_text` ,  `post_likes_count` FROM  `posts` WHERE  `user_name` =  'mark' OR  `post_privacy` =  'public'

MongoDB: db.posts.find({$or:[{user_name:"mark"},{post_privacy:"public"}]},{post_text:1,post_likes_count:1})

Next, we will use the sort method which sorts the result in ascending order of post_likes_count(indicated by 1).

SQL: SELECT *  FROM `posts` WHERE `user_name` = 'mark' order by post_likes_count ASC

MongoDB: db.posts.find({user_name:"mark"}).sort({post_likes_count:1})

To sort the results in descending order, we specify -1 as the value of the field.

SQL: SELECT *  FROM `posts` WHERE `user_name` = 'mark' order by post_likes_count DESC

MongoDB: db.posts.find({user_name:"mark"}).sort({post_likes_count:-1})

To limit the number of documents to be returned, we use the limit method specifying the number of documents.

SQL: SELECT *  FROM `posts` LIMIT 10

MongoDB: db.posts.find().limit(10)

The way we use offset in SQL to skip some number of records, we use skip function in MongoDB. For example, the following statement would fetch ten posts skipping the first five.

SQL: SELECT *  FROM `posts` LIMIT 10 OFFSET  5

MongoDB: db.posts.find().limit(10).skip(5)

Update

The first parameter to the update method specifies the criteria to select the documents. The second parameter specifies the actual update operation to be performed. For example, the following query selects all the documents with user_name as mark and sets their post_privacy as private.

One difference here is that by default, MongoDB update query updates only one (and the first matched) document. To update all the matching documents we have to provide a third parameter specifying multi as true indicating that we want to update multiple documents.

SQL: UPDATE posts SET post_privacy = "private" WHERE user_name='mark'

MongoDB: db.posts.update({user_name:"mark"},{$set:{post_privacy:"private"}},{multi:true})

Remove

Removing documents is quite simple and similar to SQL.

SQL: DELETE FROM posts WHERE user_name='mark'
 
MongoDB:  db.posts.remove({user_name:"mark"})

Indexing

MongoDB has a default index created on the _id field of each collection. To create new indexes on the fields, we use ensureIndex method specifying the fields and associated sort order indicated by 1 or -1(ascending or descending).

SQL: CREATE INDEX index_posts ON posts(user_name,post_likes_count DESC)

MongoDB: db.posts.ensureIndex({user_name:1,post_likes_count:-1})

To see all the indexes present in any collection, we use getIndexes method on the same lines of SHOW INDEX query of SQL.

SQL: SHOW INDEX FROM posts

MongoDB: db.posts.getIndexes()

Conclusion

In this article, we understood how the elementary concepts and terms of RDBMS/SQL relate in MongoDB. We looked upon designing relationships in MongoDB and learnt how the functionality of basic SQL queries map in MongoDB.

After getting a head start with this article, you can go ahead trying out complex queries including aggregation, map reduce and queries involving multiple collections. You can also take help of some online tools to convert SQL queries to MongoDB queries in the beginning. You can play designing a sample MongoDB database schema on your own. One of the best examples to do so would be a database to store user posts, their likes, comments and comment likes. This would give you a practical view of the flexible schema design that MongoDB offers.

Feel free to comment any suggestions, questions or ideas you would like to see further.

HIGHLIGHTS OF THE DAY