Writing Robust Web Applications - The Lost Art of Exception Handling

As developers, we want the applications we build to be resilient when it comes to failure, but how do you achieve this goal? If you believe the hype, micro-services and a clever communication protocol are the answer to all your problems, or maybe automatic DNS failover. While that kind of stuff has its place and makes for an interesting conference presentation, the somewhat less glamorous truth is that making a robust application begins with your code. But, even well designed and well tested applications are often lacking a vital component of resilient code - exception handling.

Sponsored Content

This content was commissioned by Engine Yard and was written and/or edited by the Tuts+ team. Our aim with sponsored content is to publish relevant and objective tutorials, case studies, and inspirational interviews that offer genuine educational value to our readers and enable us to fund the creation of more useful content.

I never fail to be amazed by just how under-used exception handling tends to be even within mature codebases. Let's look at an example.

What Can Possibly Go Wrong?

Say we have a Rails app, and one of the things we can do using this app is fetch a list of the latest tweets for a user, given their handle. Our TweetsController might look like this:

class TweetsController < ApplicationController
  def show
    person = Person.find_or_create_by(handle: params[:handle])
    if person.persisted?
      @tweets = person.fetch_tweets
    else
      flash[:error] = "Unable to create person with handle: #{person.handle}"
    end
  end
end

And the Person model that we used might be similar to the following:

class Person < ActiveRecord::Base
  def fetch_tweets
    client = Twitter::REST::Client.new do |config|
      config.consumer_key        = configatron.twitter.consumer_key
      config.consumer_secret     = configatron.twitter.consumer_secret
      config.access_token        = configatron.twitter.access_token
      config.access_token_secret = configatron.twitter.access_token_secret
    end
    client.user_timeline(handle).map{|tweet| tweet.text}
  end
end

This code seems perfectly reasonable, there are dozens of apps that have code just like this sitting in production, but let's look a little more closely.

find_or_create_by is a Rails method, it's not a 'bang' method, so it shouldn't throw exceptions, but if we look at the documentation we can see that due to the way this method works, it can raise an ActiveRecord::RecordNotUnique error. This won't happen often, but if our application has a decent amount of traffic it's occurring more likely than you might expect (I've seen it happen many times).
While we're on the subject, any library you use can throw unexpected errors due to bugs within the library itself and Rails is no exception. Depending on our level of paranoia we might expect our find_or_create_by to throw any kind of unexpected error at any time (a healthy level of paranoia is a good thing when it comes to building robust software). If we have no global way of handling unexpected errors (we'll discuss this below), we might want to handle these individually.
Then there is person.fetch_tweets which instantiates a Twitter client and tries to fetch some tweets. This will be a network call and is prone to all sorts of failure. We may want to read the documentation to figure out what the possible errors we might expect are, but we know that errors are not only possible here, but quite likely (for example, the Twitter API might be down, a person with that handle might not exist etc.). Not putting some exception handling logic around network calls is asking for trouble.

Our tiny amount of code has some serious issues, let's try and make it better.

The Right Amount of Exception Handling

We'll wrap our find_or_create_by and push it down into the Person model:

class Person < ActiveRecord::Base
  class << self
    def find_or_create_by_handle(handle)
      begin
        Person.find_or_create_by(handle: handle)
      rescue ActiveRecord::RecordNotUnique
        Rails.logger.warn { "Encountered a non-fatal RecordNotUnique error for: #{handle}" }
        retry
      rescue => e
        Rails.logger.error { "Encountered an error when trying to find or create Person for: #{handle}, #{e.message} #{e.backtrace.join("\n")}" }
        nil
      end
    end
  end
end

We've handled the ActiveRecord::RecordNotUnique according to the documentation and now we know for a fact that we'll either get a Person object or nil if something goes wrong. This code is now solid, but what about fetching our tweets:

class Person < ActiveRecord::Base
  def fetch_tweets
    client.user_timeline(handle).map{|tweet| tweet.text}
  rescue => e
    Rails.logger.error { "Error while fetching tweets for: #{handle}, #{e.message} #{e.backtrace.join("\n")}" }
    nil
  end

  private

  def client
    @client ||= Twitter::REST::Client.new do |config|
      config.consumer_key        = configatron.twitter.consumer_key
      config.consumer_secret     = configatron.twitter.consumer_secret
      config.access_token        = configatron.twitter.access_token
      config.access_token_secret = configatron.twitter.access_token_secret
    end
  end
end

We push instantiating the Twitter client down into its own private method and since we didn't know what could go wrong when we fetch the tweets, we rescue everything.

You may have heard somewhere that you should always catch specific errors. This is a laudable goal, but people often misinterpret it as, "if I can't catch something specific, I won't catch anything". In reality, if you can't catch something specific you should catch everything! This way at least you have an opportunity to do something even if it's only to log and re-raise the error.

An Aside on OO Design

In order to make our code more robust, we were forced to refactor and now our code is arguably better than it was before. You can use your desire for more resilient code to inform your design decisions.

An Aside on Testing

Every time you add some exception handling logic to a method, it's also an extra path through that method and it needs to be tested. It's vital you test the exceptional path, perhaps more so than testing the happy path. If something goes wrong on the happy path you now have the extra insurance of the rescue block to prevent your app from falling over. However, any logic inside the rescue block itself has no such insurance. Test your exceptional path well, so that silly things like mistyping a variable name inside the rescue block don't cause your application to blow up (this has happened to me so many times - seriously, just test your rescue blocks).

What to Do With the Errors We Catch

I've seen this kind of code countless times through the years:

begin
  widgetron.create
rescue
  # don't need to do anything
end

We rescue an exception and don't do anything with it. This is almost always a bad idea. When you're debugging a production issue six months from now, trying to figure our why your 'widgetron' isn't showing up in the database, you won't remember that innocent comment and hours of frustration will follow.

Don't swallow exceptions! At the very least you should log any exception that you catch, for example:

begin
  foo.bar
rescue => e
  Rails.logger.error { "#{e.message} #{e.backtrace.join("\n")}" }
end

This way we can trawl the logs and we'll have the cause and stack trace of the error to look at.

Better yet, you may use an error monitoring service, such as Rollbar which is pretty nice. There are many advantages to this:

Your error messages aren't interspersed with other log messages
You will get stats on how often the same error has happened (so you can figure out if it's a serious issue or not)
You can send extra information along with the error to help you diagnose the problem
You can get notifications (via email, pagerduty etc.) when errors occur in your app
You can track deploys to see when particular errors were introduced or fixed
etc.

begin
  foo.bar
rescue => e
  Rails.logger.error { "#{e.message} #{e.backtrace.join("\n")}" }
  Rollbar.report_exception(e)
end

You can, of course, both log and use a monitoring service as above.

If your rescue block is the last thing in a method, I recommend having an explicit return:

def my_method
  begin
    foo.bar
  rescue => e
    Rails.logger.error { "#{e.message} #{e.backtrace.join("\n")}" }
    Rollbar.report_exception(e)
    nil
  end
end

You may not always want to return nil, sometimes you might be better off with a null object or whatever else makes sense in the context of your application. Consistently using explicit return values will save everyone a lot of confusion.

You can also re-raise the same error or raise a different one inside your rescue block. One pattern that I often find useful is to wrap the existing exception in a new one and raise that one so as not to lose the original stack trace (I even wrote a gem for this since Ruby doesn't provide this functionality out of the box). Later on in the article when we talk about external services, I will show you why this can be useful.

Handling Errors Globally

Rails lets you specify how to handle requests for resources of a certain format (HTML, XML, JSON) by using respond_to and respond_with. I rarely see apps that correctly use this functionality, after all if you don't use a respond_to block everything works fine and Rails renders your template correctly. We hit our tweets controller via /tweets/yukihiro_matz and get an HTML page full of Matzs' latest tweets. What people often forget is that it's very easy to try and request a different format of the same resource e.g. /tweets/yukihiro_matz.json. At this point Rails will valiantly try to return a JSON representation of Matzs' tweets, but it won't go well since the view for it doesn't exist. An ActionView::MissingTemplate error will get raised and our app blows up in a spectacular fashion. And JSON is a legitimate format, in a high traffic application you're just as likely to get a request for /tweets/yukihiro_matz.foobar. Tuts+ gets these kinds of requests all the time (likely from bots trying to be clever).

The lesson is this, if you're not planning to return a legitimate response for a particular format, restrict your controllers from trying to fulfill requests for those formats. In the case of our TweetsController:

class TweetsController < ApplicationController
  respond_to :html

  def show
    ...
    respond_to do |format|
      format.html
    end
  end
end

Now when we get requests for spurious formats we'll get a more relevant ActionController::UnknownFormat error. Our controllers feel somewhat tighter which is a great thing when it comes to making them more robust.

Handling Errors the Rails Way

The problem we have now, is that despite our semantically pleasing error, our application is still blowing up in our users' face. This is where global exception handling comes in. Sometimes our application will produce errors that we want to respond to consistently, no matter where they come from (like our ActionController::UnknownFormat). There are also errors that can get raised by the framework before any of our code comes into play. A perfect example of this is ActionController::RoutingError. When someone requests a URL that doesn't exist, like /tweets2/yukihiro_matz, there is nowhere for us to hook in to rescue this error, using traditional exception handling. This is where Rails' exceptions_app comes in.

You can configure a Rack app in application.rb to be called when an error that we haven't handled is produced (like our ActionController::RoutingError or ActionController::UnknownFormat). The way you will normally see this used is to configure your routes app as the exceptions_app, then define the various routes for the errors you want to handle and route them to a special errors controller that you create. So our application.rb would look like this:

...
config.exceptions_app = self.routes
...

Our routes.rb will then contain the following:

...
match '/404' => 'errors#not_found', via: :all
match '/406' => 'errors#not_acceptable', via: :all
match '/500' => 'errors#internal_server_error', via: :all
...

In this case our ActionController::RoutingError would be picked up by the 404 route and the ActionController::UnknownFormat will be picked up by the 406 route. There are many possible errors that can crop up. But as long as you handle the common ones (404, 500, 422 etc.) to start with, you can add others if and when they happen.

Within our errors controller we can now render the relevant templates for each kind of error along with our layout (if it's not a 500) to maintain the branding. We can also log the errors and send them to our monitoring service, although most monitoring services will hook in to this process automatically so you don't have to send the errors yourself. Now when our application blows up it does so gently, with the right status code depending on the error and a page where we can give the user some idea regarding what happened and what they can do (contact support) - an infinitely better experience. More importantly, our app will seem (and will actually be) much more solid.

Multiple Errors of the Same Type in a Controller

In any Rails controller we can define specific errors to be handled globally within that controller (no matter which action they get produced in) - we do this via rescue_from. The question is when to use rescue_from? I usually find that a good pattern is to use it for errors that can occur in multiple actions (for example, the same error in more than one action). If an error will only be produced by one action, handle it via the traditional begin...rescue...end mechanism, but if we're likely to get the same error in multiple places and we want to handle it the same way - it's a good candidate for a rescue_from. Let's say our TweetsController also has a create action:

class TweetsController < ApplicationController
  respond_to :html

  def show
    ...
    respond_to do |format|
      format.html
    end
  end

  def create
    ...
  end
end

Let's also say that both of these actions can encounter a TwitterError and if they do we want to tell the user that something is wrong with Twitter. This is where rescue_from can be really handy:

class TweetsController < ApplicationController
  respond_to :html

  rescue_from TwitterError, with: twitter_error

  private

  def twitter_error
    render :twitter_error
  end
end

Now we don't need to worry about handling this in our actions and they will look much cleaner and we can/should - of course - log our error and/or notify our error monitoring service within the twitter_error method. If you use rescue_from correctly it can not only help you make your application more robust, but can also make your controller code cleaner. This will make it easier to maintain and test your code making your application that little bit more resilient yet again.

Using External Services in Your Application

It's difficult to write a significant application these days without using a number of external services/APIs. In the case of our TweetsController, Twitter came into play via a Ruby gem that wraps the Twitter API. Ideally we would make all our external API calls asynchronously, but we're not covering asynchronous processing in this article and there are plenty of applications out there that make at least some API/network calls in-process.

Making network calls is an extremely error prone task and good exception handling is a must. You can get authentication errors, configuration problems, and connectivity errors. The library you use can produce any number of code errors and then there is a matter of slow connections. I am glossing over this point, but it's oh so crucial since you can't deal with slow connections via exception handling. You need to appropriately configure timeouts in your network library, or if you're using an API wrapper make sure it provides hooks to configure timeouts. There is no worse experience for a user than having to sit there waiting without your application giving any indication of what's happening. Just about everyone forgets to configure timeouts appropriately (I know I have), so take heed.

If you're using an external service in multiple places within your application (multiple models for example), you expose large parts of your application to the full landscape of errors that can be produced. This is not a good situation. What we want to do is limit our exposure and one way we can do this is putting all access to our external services behind a facade, rescuing all errors there and re-raising one semantically appropriate error (raise that TwitterError that we talked about if any errors occur when we try to hit the Twitter API). We can then easily use techniques like rescue_from to deal with these errors and we don't expose large parts of our application to an unknown number of errors from external sources.

An even better idea might be to make your facade an error free API. Return all successful responses as is and return nils or null objects when you rescue any sort of error (we do still need to log/notify ourselves of the errors via some of the methods we discussed above). This way we don't need to mix different types of control flow (exception control flow vs if...else) which may gain us significantly cleaner code. For example, let's wrap our Twitter API access in a TwitterClient object:

class TwitterClient
  attr_reader :client

  def initialize
    @client = Twitter::REST::Client.new do |config|
      config.consumer_key        = configatron.twitter.consumer_key
      config.consumer_secret     = configatron.twitter.consumer_secret
      config.access_token        = configatron.twitter.access_token
      config.access_token_secret = configatron.twitter.access_token_secret
    end
  end

  def latest_tweets(handle)
    client.user_timeline(handle).map{|tweet| tweet.text}
  rescue => e
    Rails.logger.error { "#{e.message} #{e.backtrace.join("\n")}" }
    nil
  end
end

We can now do this: TwitterClient.new.latest_tweets('yukihiro_matz'), anywhere in our code and we know that it will never produce an error, or rather it will never propagate the error beyond TwitterClient. We've isolated an external system to make sure that glitches in that system won't bring down our main application.

But What if I Have Excellent Test Coverage?

If you do have well-tested code, I commend you on your diligence, it will take you a long way towards having a more robust application. But a good test suite can often provide a false sense of security. Good tests can help you refactor with confidence and protect you against regression. But, you can only write tests for things you expect to happen. Bugs are, by their very nature, unexpected. To use our tweets example, until we choose to write a test for our fetch_tweets method where client.user_timeline(handle) raises an error thereby forcing us to wrap a rescue block around the code, all our tests will have been green and our code would have remained failure-prone.

Writing tests, doesn't absolve us of the responsibility of casting a critical eye over our code to figure out how this code can potentially break. On the other hand, doing this kind of evaluation can definitely help us write better, more complete test suites.

Conclusion

Resilient systems don't spring forth fully formed from a weekend hack session. Making an application robust, is an ongoing process. You discover bugs, fix them, and write tests to make sure they don't come back. When your application goes down due to an external system failure, you isolate that system to make sure the failure can't snowball again. Exception handling is your best friend when it comes to doing this. Even the most failure-prone application can be turned into a robust one if you apply good exception handling practices consistently, over time.

Of course, exception handling is not the only tool in your arsenal when it comes to making applications more resilient. In subsequent articles we will talk about asynchronous processing, how and when to apply it and what it can do in terms of making your application fault tolerant. We will also look at some deployment and infrastructure tips that can have a significant impact without breaking the bank in terms of both money and time - stay tuned.

HIGHLIGHTS OF THE DAY