As developers, we want the applications we build to be resilient when it comes to failure, but how do you achieve this goal? If you believe the hype, micro-services and a clever communication protocol are the answer to all your problems, or maybe automatic DNS failover. While that kind of stuff has its place and makes for an interesting conference presentation, the somewhat less glamorous truth is that making a robust application begins with your code. But, even well designed and well tested applications are often lacking a vital component of resilient code - exception handling.
This content was commissioned by Engine Yard and was written and/or edited by the Tuts+ team. Our aim with sponsored content is to publish relevant and objective tutorials, case studies, and inspirational interviews that offer genuine educational value to our readers and enable us to fund the creation of more useful content.
I never fail to be amazed by just how under-used exception handling tends to be even within mature codebases. Let's look at an example.
What Can Possibly Go Wrong?
Say we have a Rails app, and one of the things we can do using this app is fetch a list of the latest tweets for a user, given their handle. Our TweetsController
might look like this:
class TweetsController < ApplicationController def show person = Person.find_or_create_by(handle: params[:handle]) if person.persisted? @tweets = person.fetch_tweets else flash[:error] = "Unable to create person with handle: #{person.handle}" end end end
And the Person
model that we used might be similar to the following:
class Person < ActiveRecord::Base def fetch_tweets client = Twitter::REST::Client.new do |config| config.consumer_key = configatron.twitter.consumer_key config.consumer_secret = configatron.twitter.consumer_secret config.access_token = configatron.twitter.access_token config.access_token_secret = configatron.twitter.access_token_secret end client.user_timeline(handle).map{|tweet| tweet.text} end end
This code seems perfectly reasonable, there are dozens of apps that have code just like this sitting in production, but let's look a little more closely.
-
find_or_create_by
is a Rails method, it's not a 'bang' method, so it shouldn't throw exceptions, but if we look at the documentation we can see that due to the way this method works, it can raise anActiveRecord::RecordNotUnique
error. This won't happen often, but if our application has a decent amount of traffic it's occurring more likely than you might expect (I've seen it happen many times). - While we're on the subject, any library you use can throw unexpected errors due to bugs within the library itself and Rails is no exception. Depending on our level of paranoia we might expect our
find_or_create_by
to throw any kind of unexpected error at any time (a healthy level of paranoia is a good thing when it comes to building robust software). If we have no global way of handling unexpected errors (we'll discuss this below), we might want to handle these individually. - Then there is
person.fetch_tweets
which instantiates a Twitter client and tries to fetch some tweets. This will be a network call and is prone to all sorts of failure. We may want to read the documentation to figure out what the possible errors we might expect are, but we know that errors are not only possible here, but quite likely (for example, the Twitter API might be down, a person with that handle might not exist etc.). Not putting some exception handling logic around network calls is asking for trouble.
Our tiny amount of code has some serious issues, let's try and make it better.
The Right Amount of Exception Handling
We'll wrap our find_or_create_by
and push it down into the Person
model:
class Person < ActiveRecord::Base class << self def find_or_create_by_handle(handle) begin Person.find_or_create_by(handle: handle) rescue ActiveRecord::RecordNotUnique Rails.logger.warn { "Encountered a non-fatal RecordNotUnique error for: #{handle}" } retry rescue => e Rails.logger.error { "Encountered an error when trying to find or create Person for: #{handle}, #{e.message} #{e.backtrace.join("\n")}" } nil end end end end
We've handled the ActiveRecord::RecordNotUnique
according to the documentation and now we know for a fact that we'll either get a Person
object or nil
if something goes wrong. This code is now solid, but what about fetching our tweets:
class Person < ActiveRecord::Base def fetch_tweets client.user_timeline(handle).map{|tweet| tweet.text} rescue => e Rails.logger.error { "Error while fetching tweets for: #{handle}, #{e.message} #{e.backtrace.join("\n")}" } nil end private def client @client ||= Twitter::REST::Client.new do |config| config.consumer_key = configatron.twitter.consumer_key config.consumer_secret = configatron.twitter.consumer_secret config.access_token = configatron.twitter.access_token config.access_token_secret = configatron.twitter.access_token_secret end end end
We push instantiating the Twitter client down into its own private method and since we didn't know what could go wrong when we fetch the tweets, we rescue everything.
You may have heard somewhere that you should always catch specific errors. This is a laudable goal, but people often misinterpret it as, "if I can't catch something specific, I won't catch anything". In reality, if you can't catch something specific you should catch everything! This way at least you have an opportunity to do something even if it's only to log and re-raise the error.
An Aside on OO Design
In order to make our code more robust, we were forced to refactor and now our code is arguably better than it was before. You can use your desire for more resilient code to inform your design decisions.
An Aside on Testing
Every time you add some exception handling logic to a method, it's also an extra path through that method and it needs to be tested. It's vital you test the exceptional path, perhaps more so than testing the happy path. If something goes wrong on the happy path you now have the extra insurance of the rescue
block to prevent your app from falling over. However, any logic inside the rescue block itself has no such insurance. Test your exceptional path well, so that silly things like mistyping a variable name inside the rescue
block don't cause your application to blow up (this has happened to me so many times - seriously, just test your rescue
blocks).
What to Do With the Errors We Catch
I've seen this kind of code countless times through the years:
begin widgetron.create rescue # don't need to do anything end
We rescue an exception and don't do anything with it. This is almost always a bad idea. When you're debugging a production issue six months from now, trying to figure our why your 'widgetron' isn't showing up in the database, you won't remember that innocent comment and hours of frustration will follow.
Don't swallow exceptions! At the very least you should log any exception that you catch, for example:
begin foo.bar rescue => e Rails.logger.error { "#{e.message} #{e.backtrace.join("\n")}" } end
This way we can trawl the logs and we'll have the cause and stack trace of the error to look at.
Better yet, you may use an error monitoring service, such as Rollbar which is pretty nice. There are many advantages to this:
- Your error messages aren't interspersed with other log messages
- You will get stats on how often the same error has happened (so you can figure out if it's a serious issue or not)
- You can send extra information along with the error to help you diagnose the problem
- You can get notifications (via email, pagerduty etc.) when errors occur in your app
- You can track deploys to see when particular errors were introduced or fixed
- etc.
begin foo.bar rescue => e Rails.logger.error { "#{e.message} #{e.backtrace.join("\n")}" } Rollbar.report_exception(e) end
You can, of course, both log and use a monitoring service as above.
If your rescue
block is the last thing in a method, I recommend having an explicit return:
def my_method begin foo.bar rescue => e Rails.logger.error { "#{e.message} #{e.backtrace.join("\n")}" } Rollbar.report_exception(e) nil end end
You may not always want to return nil
, sometimes you might be better off with a null object or whatever else makes sense in the context of your application. Consistently using explicit return values will save everyone a lot of confusion.
You can also re-raise the same error or raise a different one inside your rescue
block. One pattern that I often find useful is to wrap the existing exception in a new one and raise that one so as not to lose the original stack trace (I even wrote a gem for this since Ruby doesn't provide this functionality out of the box). Later on in the article when we talk about external services, I will show you why this can be useful.
Handling Errors Globally
Rails lets you specify how to handle requests for resources of a certain format (HTML, XML, JSON) by using respond_to
and respond_with
. I rarely see apps that correctly use this functionality, after all if you don't use a respond_to
block everything works fine and Rails renders your template correctly. We hit our tweets controller via /tweets/yukihiro_matz
and get an HTML page full of Matzs' latest tweets. What people often forget is that it's very easy to try and request a different format of the same resource e.g. /tweets/yukihiro_matz.json
. At this point Rails will valiantly try to return a JSON representation of Matzs' tweets, but it won't go well since the view for it doesn't exist. An ActionView::MissingTemplate
error will get raised and our app blows up in a spectacular fashion. And JSON is a legitimate format, in a high traffic application you're just as likely to get a request for /tweets/yukihiro_matz.foobar
. Tuts+ gets these kinds of requests all the time (likely from bots trying to be clever).
The lesson is this, if you're not planning to return a legitimate response for a particular format, restrict your controllers from trying to fulfill requests for those formats. In the case of our TweetsController
:
class TweetsController < ApplicationController respond_to :html def show ... respond_to do |format| format.html end end end
Now when we get requests for spurious formats we'll get a more relevant ActionController::UnknownFormat
error. Our controllers feel somewhat tighter which is a great thing when it comes to making them more robust.
Handling Errors the Rails Way
The problem we have now, is that despite our semantically pleasing error, our application is still blowing up in our users' face. This is where global exception handling comes in. Sometimes our application will produce errors that we want to respond to consistently, no matter where they come from (like our ActionController::UnknownFormat
). There are also errors that can get raised by the framework before any of our code comes into play. A perfect example of this is ActionController::RoutingError
. When someone requests a URL that doesn't exist, like /tweets2/yukihiro_matz
, there is nowhere for us to hook in to rescue this error, using traditional exception handling. This is where Rails' exceptions_app
comes in.
You can configure a Rack app in application.rb
to be called when an error that we haven't handled is produced (like our ActionController::RoutingError
or ActionController::UnknownFormat
). The way you will normally see this used is to configure your routes app as the exceptions_app
, then define the various routes for the errors you want to handle and route them to a special errors controller that you create. So our application.rb
would look like this:
... config.exceptions_app = self.routes ...
Our routes.rb
will then contain the following:
... match '/404' => 'errors#not_found', via: :all match '/406' => 'errors#not_acceptable', via: :all match '/500' => 'errors#internal_server_error', via: :all ...
In this case our ActionController::RoutingError
would be picked up by the 404
route and the ActionController::UnknownFormat
will be picked up by the 406
route. There are many possible errors that can crop up. But as long as you handle the common ones (404
, 500
, 422
etc.) to start with, you can add others if and when they happen.
Within our errors controller we can now render the relevant templates for each kind of error along with our layout (if it's not a 500) to maintain the branding. We can also log the errors and send them to our monitoring service, although most monitoring services will hook in to this process automatically so you don't have to send the errors yourself. Now when our application blows up it does so gently, with the right status code depending on the error and a page where we can give the user some idea regarding what happened and what they can do (contact support) - an infinitely better experience. More importantly, our app will seem (and will actually be) much more solid.
Multiple Errors of the Same Type in a Controller
In any Rails controller we can define specific errors to be handled globally within that controller (no matter which action they get produced in) - we do this via rescue_from. The question is when to use rescue_from
? I usually find that a good pattern is to use it for errors that can occur in multiple actions (for example, the same error in more than one action). If an error will only be produced by one action, handle it via the traditional begin...rescue...end
mechanism, but if we're likely to get the same error in multiple places and we want to handle it the same way - it's a good candidate for a rescue_from
. Let's say our TweetsController
also has a create
action:
class TweetsController < ApplicationController respond_to :html def show ... respond_to do |format| format.html end end def create ... end end
Let's also say that both of these actions can encounter a TwitterError
and if they do we want to tell the user that something is wrong with Twitter. This is where rescue_from
can be really handy:
class TweetsController < ApplicationController respond_to :html rescue_from TwitterError, with: twitter_error private def twitter_error render :twitter_error end end
Now we don't need to worry about handling this in our actions and they will look much cleaner and we can/should - of course - log our error and/or notify our error monitoring service within the twitter_error
method. If you use rescue_from
correctly it can not only help you make your application more robust, but can also make your controller code cleaner. This will make it easier to maintain and test your code making your application that little bit more resilient yet again.
Using External Services in Your Application
It's difficult to write a significant application these days without using a number of external services/APIs. In the case of our TweetsController
, Twitter came into play via a Ruby gem that wraps the Twitter API. Ideally we would make all our external API calls asynchronously, but we're not covering asynchronous processing in this article and there are plenty of applications out there that make at least some API/network calls in-process.
Making network calls is an extremely error prone task and good exception handling is a must. You can get authentication errors, configuration problems, and connectivity errors. The library you use can produce any number of code errors and then there is a matter of slow connections. I am glossing over this point, but it's oh so crucial since you can't deal with slow connections via exception handling. You need to appropriately configure timeouts in your network library, or if you're using an API wrapper make sure it provides hooks to configure timeouts. There is no worse experience for a user than having to sit there waiting without your application giving any indication of what's happening. Just about everyone forgets to configure timeouts appropriately (I know I have), so take heed.
If you're using an external service in multiple places within your application (multiple models for example), you expose large parts of your application to the full landscape of errors that can be produced. This is not a good situation. What we want to do is limit our exposure and one way we can do this is putting all access to our external services behind a facade, rescuing all errors there and re-raising one semantically appropriate error (raise that TwitterError
that we talked about if any errors occur when we try to hit the Twitter API). We can then easily use techniques like rescue_from
to deal with these errors and we don't expose large parts of our application to an unknown number of errors from external sources.
An even better idea might be to make your facade an error free API. Return all successful responses as is and return nils or null objects when you rescue any sort of error (we do still need to log/notify ourselves of the errors via some of the methods we discussed above). This way we don't need to mix different types of control flow (exception control flow vs if...else) which may gain us significantly cleaner code. For example, let's wrap our Twitter API access in a TwitterClient
object:
class TwitterClient attr_reader :client def initialize @client = Twitter::REST::Client.new do |config| config.consumer_key = configatron.twitter.consumer_key config.consumer_secret = configatron.twitter.consumer_secret config.access_token = configatron.twitter.access_token config.access_token_secret = configatron.twitter.access_token_secret end end def latest_tweets(handle) client.user_timeline(handle).map{|tweet| tweet.text} rescue => e Rails.logger.error { "#{e.message} #{e.backtrace.join("\n")}" } nil end end
We can now do this: TwitterClient.new.latest_tweets('yukihiro_matz')
, anywhere in our code and we know that it will never produce an error, or rather it will never propagate the error beyond TwitterClient
. We've isolated an external system to make sure that glitches in that system won't bring down our main application.
But What if I Have Excellent Test Coverage?
If you do have well-tested code, I commend you on your diligence, it will take you a long way towards having a more robust application. But a good test suite can often provide a false sense of security. Good tests can help you refactor with confidence and protect you against regression. But, you can only write tests for things you expect to happen. Bugs are, by their very nature, unexpected. To use our tweets example, until we choose to write a test for our fetch_tweets
method where client.user_timeline(handle)
raises an error thereby forcing us to wrap a rescue
block around the code, all our tests will have been green and our code would have remained failure-prone.
Writing tests, doesn't absolve us of the responsibility of casting a critical eye over our code to figure out how this code can potentially break. On the other hand, doing this kind of evaluation can definitely help us write better, more complete test suites.
Conclusion
Resilient systems don't spring forth fully formed from a weekend hack session. Making an application robust, is an ongoing process. You discover bugs, fix them, and write tests to make sure they don't come back. When your application goes down due to an external system failure, you isolate that system to make sure the failure can't snowball again. Exception handling is your best friend when it comes to doing this. Even the most failure-prone application can be turned into a robust one if you apply good exception handling practices consistently, over time.
Of course, exception handling is not the only tool in your arsenal when it comes to making applications more resilient. In subsequent articles we will talk about asynchronous processing, how and when to apply it and what it can do in terms of making your application fault tolerant. We will also look at some deployment and infrastructure tips that can have a significant impact without breaking the bank in terms of both money and time - stay tuned.
Comments