Users like blazing fast applications, and then they fall in love with them and make them part of their life. Slow applications, on the other hand, only annoy users and lose revenue. In this tutorial, we are going make sure that we don't lose more money or users, and understand the different ways to improve performance.
Active Records and ORM are very powerful tools in Ruby on Rails, but only if we know how to unleash and use that power. In the beginning, you'll find tons of ways to perform a similar task in RoR, but only when you dig a bit deeper do you actually get to know the costs of using one over another.
It's the same story in the case of ORM and Associations in Rails. They sure make our life lot easier, but in some situations can also act as overkill.
Let's Take an Example
But before that, let's quickly generate a dummy application to play around with.
Step 1
Start up your terminal and type these commands to create a new application:
rails new blog cd blog
Step 2
Generate your application:
rails g scaffold Author name:string rails g scaffold Post title:string body:text author:references
Step 3
Deploy it on your local server:
rake db:migrate rails s
And that was it! Now you should have a running dummy application.
This is how both our models (Author and Post) should look. We have Posts that belong to Author, and we have Authors that can have many posts. This is the very basic association/relation between these two models that we are going to play with.
# Post Model class Post < ActiveRecord::Base belongs_to :author end # Author Model class Author < ActiveRecord::Base has_many :posts end
Take a look at your "Posts Controller"—this is how it should look. Our main focus will be on its index method only.
# Controller class PostsController < ApplicationController def index @posts = Post.order(created_at: :desc) end end
And last but not least, our Posts Index View. Yours may seem to have some extra lines, but these are the ones I want you to focus on, especially the line with post.author.name
.
<tbody> <% @posts.each do |post| %> <tr> <td><%= post.title %></td> <td><%= post.body %></td> <td><%= post.author.name %></td> </tr> <% end %> </tbody>
Let's just create some dummy data before we get started. Go to your rails console and add the following lines. Or you can just go to http://localhost:3000/posts/new
and http://localhost:3000/authors/new
to add some data manually.
authors = Author.create([{ name: 'John' }, { name: 'Doe' }, { name: 'Manish' }]) Post.create(title: 'I love Tuts+', body: '', author: authors.first) Post.create(title: 'Tuts+ is Awesome', body: '', author: authors.second) Post.create(title: 'Long Live Tuts+', body: '', author: authors.last)
Now that you are all set up, let's start the server with rails s
and hit localhost:3000/posts
.
You'll see some results on your screen like this.
So everything seems fine: no errors, and it fetches all the records along with the associated Author names. But if you take a look at your development log, you’ll see tons of queries being executed like below.
Post Load (0.6ms) SELECT "posts".* FROM "posts" ORDER BY "posts"."created_at" DESC Author Load (0.5ms) SELECT "authors".* FROM "authors" WHERE "authors"."id" = ? LIMIT 1 [["id", 3]] Author Load (0.1ms) SELECT "authors".* FROM "authors" WHERE "authors"."id" = ? LIMIT 1 [["id", 2]] Author Load (0.1ms) SELECT "authors".* FROM "authors" WHERE "authors"."id" = ? LIMIT 1 [["id", 1]]
Well, okay, I agree these are just four queries, but imagine you have 3,000 posts in your database instead of just three. In that case, our database is going to be flooded with 3,000+1 queries, which is why this problem is called the N+1
problem.
Why Do We Get This Problem?
So by default in Ruby on Rails, the ORM has lazy loading enabled, which means it delays the loading of data until the point where we actually need it.
In our case, first it's the controller where it's asked to fetch all the posts.
def index @posts = Post.order(created_at: :desc) end
Second is the view, where we loop through the posts fetched by the controller and send a query to get the author name for each post separately. Hence the N+1
problem.
<% @posts.each do |post| %> <tr> . . . <td><%= post.author.name %></td> </tr> <% end %>
How Do We Solve the Problem?
To rescue us from such situations, Rails offers us a feature called eager loading.
Eager loading lets you preload the associated data (authors) for all the posts from the database, improves the overall performance by reducing the number of queries, and provides you with the data that you want to display in your views, but the only catch here is which one to use. Gotcha!
Yes because we have three of them, and all serve the same purpose, but depending on the case, any of them can prove to be reducing or overkilling the performance again.
preload() eager_load() includes()
Now you might ask which one to use in this case? Well, let's start with the first one.
def index @posts = Post.order(created_at: :desc).preload(:author) end
Save it. Hit the URL again localhost:3000/posts
.
So no changes in the results: everything loads exactly the same way, but under the hood in the development log, those tons of queries have been changed to the following two.
SELECT "posts".* FROM "posts" ORDER BY "posts"."created_at" DESC SELECT "authors".* FROM "authors" WHERE "authors"."id" IN (3, 2, 1)
Preload uses two separate queries to load the main data and the associated data. This is actually way better than having a separate query for each author name (the N+1 Problem), but this is not enough for us. Due to its separate queries approach, it will throw an exception in scenarios like:
- Order posts by authors name.
- Find posts from the author "John" only.
Let's Try All Scenarios With eager_load() One by One
1. Order Posts by Author's Name
# Order posts by authors name. def index @posts = Post.order("authors.name").eager_load(:author) end
Resulting Query in the Development Logs:
SELECT "posts"."id" AS t0_r0, "posts"."title" AS t0_r1, "posts"."body" AS t0_r2, "posts"."author_id" AS t0_r3, "posts"."created_at" AS t0_r4, "posts"."updated_at" AS t0_r5, "authors"."id" AS t1_r0, "authors"."name" AS t1_r1, "authors"."created_at" AS t1_r2, "authors"."updated_at" AS t1_r3 FROM "posts" LEFT OUTER JOIN "authors" ON "authors"."id" = "posts"."author_id" ORDER BY authors.name
2. Find Posts From the Author "John" Only
# Find posts from the author "John" only. def index @posts = Post.order(created_at: :desc).eager_load(:author).where("authors.name = ?", "Manish") end
Resulting Query in the Development Logs:
SELECT "posts"."id" AS t0_r0, "posts"."title" AS t0_r1, "posts"."body" AS t0_r2, "posts"."author_id" AS t0_r3, "posts"."created_at" AS t0_r4, "posts"."updated_at" AS t0_r5, "authors"."id" AS t1_r0, "authors"."name" AS t1_r1, "authors"."created_at" AS t1_r2, "authors"."updated_at" AS t1_r3 FROM "posts" LEFT OUTER JOIN "authors" ON "authors"."id" = "posts"."author_id" WHERE (authors.name = 'Manish') ORDER BY "posts"."created_at" DESC
3. N+1 Scenario
def index @posts = Post.order(created_at: :desc).eager_load(:author) end
Resulting Query in the Development Logs:
SELECT "posts"."id" AS t0_r0, "posts"."title" AS t0_r1, "posts"."body" AS t0_r2, "posts"."author_id" AS t0_r3, "posts"."created_at" AS t0_r4, "posts"."updated_at" AS t0_r5, "authors"."id" AS t1_r0, "authors"."name" AS t1_r1, "authors"."created_at" AS t1_r2, "authors"."updated_at" AS t1_r3 FROM "posts" LEFT OUTER JOIN "authors" ON "authors"."id" = "posts"."author_id" ORDER BY "posts"."created_at" DESC
So if you look at the resulting queries of all three scenarios, there are two things in common.
First, eager_load()
always uses the LEFT OUTER JOIN
whatever the case may be. Second, it gets all the associated data in a single query, which sure outruns the preload()
method in situations where we want to use the associated data for extra tasks like ordering and filtering. But a single query and LEFT OUTER JOIN
can also be very expensive in simple scenarios like above, where all you need is to filter the authors needed. It's like using a bazooka to kill a tiny fly.
I understand that these are just two simple examples, and in real-world scenarios out there it can be very hard to decide on the one that's best for your situation. So that's the reason Rails has given us the includes()
method.
With includes()
, Active Record takes care of the tough decision. It’s way smarter than both the preload()
and eager_load()
methods and decides which one to use on its own.
Let's Try All Scenarios With includes()
1. Order Posts by Author's Name
# Order posts by authors name. def index @posts = Post.order("authors.name").includes(:author) end
Resulting Query in the Development Logs:
SELECT "posts"."id" AS t0_r0, "posts"."title" AS t0_r1, "posts"."body" AS t0_r2, "posts"."author_id" AS t0_r3, "posts"."created_at" AS t0_r4, "posts"."updated_at" AS t0_r5, "authors"."id" AS t1_r0, "authors"."name" AS t1_r1, "authors"."created_at" AS t1_r2, "authors"."updated_at" AS t1_r3 FROM "posts" LEFT OUTER JOIN "authors" ON "authors"."id" = "posts"."author_id" ORDER BY authors.name
2. Find Posts From the Author "John" Only
# Find posts from the author "John" only. def index @posts = Post.order(created_at: :desc).includes(:author).where("authors.name = ?", "Manish") # For rails 4 Don't forget to add .references(:author) in the end @posts = Post.order(created_at: :desc).includes(:author).where("authors.name = ?", "Manish").references(:author) end
Resulting Query in the Development Logs:
SELECT "posts"."id" AS t0_r0, "posts"."title" AS t0_r1, "posts"."body" AS t0_r2, "posts"."author_id" AS t0_r3, "posts"."created_at" AS t0_r4, "posts"."updated_at" AS t0_r5, "authors"."id" AS t1_r0, "authors"."name" AS t1_r1, "authors"."created_at" AS t1_r2, "authors"."updated_at" AS t1_r3 FROM "posts" LEFT OUTER JOIN "authors" ON "authors"."id" = "posts"."author_id" WHERE (authors.name = 'Manish') ORDER BY "posts"."created_at" DESC
3. N+1 Scenario
def index @posts = Post.order(created_at: :desc).includes(:author) end
Resulting Query in the Development Logs:
SELECT "posts".* FROM "posts" ORDER BY "posts"."created_at" DESC SELECT "authors".* FROM "authors" WHERE "authors"."id" IN (3, 2, 1)
Now if we compare the results with the eager_load()
method, the first two cases have similar results, but in the last case it smartly decided to shift to the preload()
method for better performance.
Awesome, Right?
No, because in this race of performance, sometimes eager loading can fall short too. I hope some of you have already noticed that whenever eager loading methods use JOINS
, they only use LEFT OUTER JOIN
. Also, in every case they load up too much unnecessary data in the memory—they select every single column from the table, whereas we only need the author's name.
Welcome to the Joins
Even though Active Record lets you specify conditions on the eager loaded associations just likejoins()
, the recommended way is to use joins instead. ~ Rails Documentation.
As recommended in the rails documentation, the joins()
method is one step ahead in these situations. It joins the associated table, but only loading the required model data into memory like posts in our case. Therefore, we are not loading redundant data into memory needlessly—although if we want to we can do that too.
Let's Dive Into Some Examples
1. Order Posts by Author's Name
# Order posts by authors name. def index @posts = Post.order("authors.name").joins(:author) end
Resulting Query in the Development Logs:
SELECT "posts".* FROM "posts" INNER JOIN "authors" ON "authors"."id" = "posts"."author_id" ORDER BY authors.name SELECT "authors".* FROM "authors" WHERE "authors"."id" = ? LIMIT 1 [["id", 2]] SELECT "authors".* FROM "authors" WHERE "authors"."id" = ? LIMIT 1 [["id", 1]] SELECT "authors".* FROM "authors" WHERE "authors"."id" = ? LIMIT 1 [["id", 3]]
2. Find Posts From the Author "John" Only
# Find posts from the author "John" only. def index @posts = Post.order(published_at: :desc).joins(:author).where("authors.name = ?", "John") end
Resulting Query in the Development Logs:
SELECT "posts".* FROM "posts" INNER JOIN "authors" ON "authors"."id" = "posts"."author_id" WHERE (authors.name = 'Manish') ORDER BY "posts"."created_at" DESC SELECT "authors".* FROM "authors" WHERE "authors"."id" = ? LIMIT 1 [["id", 3]]
3. N+1 Scenario
def index @posts = Post.order(published_at: :desc).joins(:author) end
Resulting Query in the Development Logs:
SELECT "posts".* FROM "posts" INNER JOIN "authors" ON "authors"."id" = "posts"."author_id" ORDER BY "posts"."created_at" DESC SELECT "authors".* FROM "authors" WHERE "authors"."id" = ? LIMIT 1 [["id", 3]] SELECT "authors".* FROM "authors" WHERE "authors"."id" = ? LIMIT 1 [["id", 2]] SELECT "authors".* FROM "authors" WHERE "authors"."id" = ? LIMIT 1 [["id", 1]]
The first thing you may notice from the results above is that the N+1
problem is back, but let's focus on the good part first.
Let's look into the first query from all the results. All of them look more or less like this.
SELECT "posts".* FROM "posts" INNER JOIN "authors" ON "authors"."id" = "posts"."author_id" ORDER BY authors.name
It fetches all columns from posts. It nicely joins both the tables and sorts or filters the records depending on the condition, but without fetching any data from the associated table. Which is what we wanted in the first place.
But after the first queries, we'll see 1
or 3
or N
number of queries depending on the data in your database, like this:
SELECT "authors".* FROM "authors" WHERE "authors"."id" = ? LIMIT 1 [["id", 2]] SELECT "authors".* FROM "authors" WHERE "authors"."id" = ? LIMIT 1 [["id", 1]] SELECT "authors".* FROM "authors" WHERE "authors"."id" = ? LIMIT 1 [["id", 3]]
Now you might ask: why is this N+1
problem back? It's because of this line in our view post.author.name
.
<tbody> <% @posts.each do |post| %> <tr> <td><%= post.title %></td> <td><%= post.body %></td> <td><%= post.author.name %></td> </tr> <% end %> </tbody>
This line triggers all those queries. So in the example where we only had to order our posts, we don't need to display the author name in our views. In that case we can fix this issue by removing the line post.author.name
from the view.
But then you might ask, "Hey MK, what about the examples where we want to display the author's name in the view?"
Well, in that case, the joins()
method is not going to fix it by itself. We'll have to tell joins()
to select the author's name, or any other column from the table for that matter. And we can do it by adding a select()
statement at the end, like this:
def index @posts = Post.order(published_at: :desc).joins(:author).select("posts.*, authors.name as author_name") end
I created an alias "author_name" for authors.name. We'll see why in just a sec.
Resulting Query in the Development Logs:
SELECT posts.*, authors.name as author_name FROM "posts" INNER JOIN "authors" ON "authors"."id" = "posts"."author_id" ORDER BY "posts"."created_at" DESC
Here we go: finally a clean SQL query with no N+1
problem, with no needless data, with just the things we need. The only thing left is to use that alias in your view and change post.author.name
to post.author_name
. This is because the author_name is now an attribute of our Post model, and after this change this is how the page looks:
Everything exactly the same, but under the hood lots of things were changed. If I put everything in a nutshell, to solve the N+1
you should go for eager loading, but at times, depending on the situation, you should take things in your control and use joins for better options. You can also supply raw SQL queries to the joins()
method for more customization.
Joins and eager loading also allow the loading of multiple associations, but in the beginning things can get very complicated and difficult to decide the best option. In such situations, I recommend that you read these two very nice Envato Tuts+ tutorials to get a better understanding of joins and be able to decide the least expensive approach in terms of performance:
Last but not least, it can be complicated to find out areas in your pre-build application where you should improve the performance generally or find the N+1
problems. In those cases I recommend a nice gem called Bullet. It can notify you when you should add eager loading for N+1
queries, and when you're using eager loading unnecessarily.
Comments