Managing Your RabbitMQ Cluster

RabbitMQ is a great distributed message broker but not so easy to administer programmatically. In this tutorial I’ll show you how to create a cluster, add nodes, remove nodes, start and stop. As a bonus I’ll share a Fabric file that lets you take total control. The code is available on GitHub.

Quick Introduction to RabbitMQ

RabbitMQ is a very popular message queue. You can have multiple producers sending messages, and consumers can consume these messages in a totally decoupled way. RabbitMQ is very popular for several reasons:

  1. It’s fast and robust.
  2. It’s open source, but there is commercial support if you want it.
  3. It runs on your operating system.
  4. It is actively developed.
  5. It is battle tested.

RabbitMQ is implemented in Erlang, which is a bit unusual, but one of the reasons it is so reliable.

Prerequisites

For the purpose of this tutorial I’ll use a local Vagrant cluster of three nodes. If you already have three available machines (virtual or not), you may use them instead. Pay attention to the ports and networking.

Install VirtualBox

Follow the instructions to install VirtualBox.

Install Vagrant

Follow the instructions to install Vagrant

Create a RabbitMQ Cluster

Here is a Vagrantfile that will create a local three-node cluster on your machine. The OS is Ubuntu 14.04 (Trusty).

To create an empty cluster, type: vagrant up.

Configuring SSH

To make it easy to ssh into the cluster nodes, type: vagrant ssh-config >> ~/.ssh/config.

If you type: cat ~/.ssh/config, you should see entries for rabbit-1, rabbit-2 and rabbit-3.

Now you can ssh into each virtual machine by name: ssh rabbit-1.

Make Sure the Nodes Are Reachable by Name

The easiest way is to edit the /etc/hosts file. For example, for rabbit-1 add the addresses of rabbit-2 and rabbit-3.

Repeat the process for all nodes.

Install RabbitMQ

I will use apt-get here for Debian/Ubuntu operating systems. If your cluster runs on a different OS, please follow the instructions on the RabbitMQ installation page.

Note that sometimes a rather out-of-date version of RabbitMQ is available by default. If you want to install the latest and greatest, you may download a .deb package directly or add RabbitMQ’s apt-repository, using these instructions.

The current version of RabbitMQ on Ubuntu 14.04 is 3.2, which is good enough for our purposes. Verify for yourself by typing: apt-cache show rabbitmq-server.

Let’s go ahead and install it on each machine:

Feel free to use your favorite configuration management tool like Chef or Ansible if you prefer.

Note that Erlang will be installed first as a prerequisite.

Enable the RabbitMQ Management Plugin

The management plugin is really cool. It gives you an HTTP-based API as well as a web GUI and a command-line tool to manage the cluster. Here is how to enable it:

Get the Management Command-Line Tool

Download it from http://192.168.77.10:15672/cli/rabbitmqadmin. Note that the RabbitMQ documentation is incorrect and tells you to download from http://:15672/cli/.

This is a Python-based HTTP client for the RabbitMQ management HTTP API. It is very convenient for scripting RabbitMQ clusters.

Basic RabbitMQ Concepts

RabbitMQ implements the AMQP 0.9.1 standard (Advanced Message Queue Protocol). Note that there is already an AMQP 1.0 standard and RabbitMQ has a plugin to support it, but it is considered a prototype due to insufficient real-world use.

In the AMQP model, publishers send messages to a message broker (RabbitMQ is the message broker in this case) via an exchange. The message broker distributes the messages to queues based on metadata associated with the message. Consumers consume messages from queues. Messages may or may not be acknowledged. RabbitMQ supports a variety of programming models on top of these concepts such as work queues, publish-subscribe and RPC.

Managing Your Cluster

There are three scripts used to manage the cluster. The rabbitmq-server script starts a RabbitMQ server (launch it). The rabbitmqctl is used to control the cluster (stop, reset, cluster nodes together and get status). The rabbitmqadmin, which you downloaded earlier, is used to configure and administer the cluster (declare vhosts, users, exchanges and queues). Creating a cluster involves just rabbitmq-server and rabbitmqctl.

First, let’s start the rabbitmq-server as a service (daemon) on each of our hosts rabbit-1, rabbit-2 and rabbit-3.

This will start both the Erlang VM and the RabbitMQ application if the node is down. To verify it is running properly, type:

The output should be (for rabbit-1):

This means the node is not clustered with any other nodes yet and it is a disc node. It is also running as you can see that it appears in the running_nodes list.

To stop the server, issue the following command:

Then if you check the cluster status:

The output should be:

No more running nodes.

You can repeat the process for the other nodes (rabbit-2 and rabbit-3) and see that they know only themselves.

The Erlang Cookie

Before you can create a cluster, all the nodes in the cluster must have the same cookie. The cookie is a file that the Erlang runtime is using to identify nodes. It is located in /var/lib/rabbitmq/.erlang.cookie. Just copy the contents from rabbit-1 to rabbit-2 and rabbit-3.

Clustering Nodes Together

To group these separate nodes into a cohesive cluster takes some work. Here is the procedure:

  • Have a single node running (e.g. rabbit-1).
  • Stop another node (e.g. rabbit-2).
  • Reset the stopped node (rabbit-2).
  • Cluster the other node to the root node.
  • Start the stopped node.

Let’s do this. ssh into rabbit-2 and run the following commands:

Now type: sudo rabbitmqctl cluster_status.

The output should be:

Now, you can start rabbit-2.

If you check the status again, both nodes will be running:

Note that both nodes are disc nodes, which means they store their metadata on disc. Let’s add rabbit-3 as a RAM node. ssh to rabbit-3 and issue the following commands:

Checking the status shows:

All cluster nodes are running. The Disc nodes are rabbit-1 and rabbit-2, and the RAM node is rabbit-3.

Congratulations! You have a working RabbitMQ cluster.

Real-World Complications

What happens if you want to change your cluster configuration? You’ll have to use surgical precision when adding and removing nodes from the cluster.

What happens if a node is not restarted yet, but you try to go on with stop_app, reset and start_app? Well, the stop_app command will ostensibly succeed, returning “done.” even if the target node is down. However, the subsequent reset command will fail with a nasty message. I spent a lot of time scratching my head trying to figure it out, because I assumed the problem was some configuration option that affected only reset.

Another gotcha is that if you want to reset the last disc node, you have to use force_reset. Trying to figure out in the general case which node was the last disc node is not trivial.

RabbitMQ also supports clustering via configuration files. This is great when your disc nodes are up, because restarted RAM nodes will just cluster based on the config file without you having to cluster them explicitly. Again, it doesn’t fly when you try to recover a broken cluster.

Reliable RabbitMQ Clustering

It comes down to this: You don’t know which was the last disc node to go down. You don’t know the clustering metadata of each node (maybe it went down while doing reset). To start all the nodes, I use the following algorithm:

  • Start all nodes (at least the last disc node should be able to start).
  • If not even a single node can start, you’re hosed. Just bail out.
  • Keep track of all nodes that failed to start.
  • Try to start all the failed nodes.
  • If some nodes failed to start the second time, you’re hosed. Just bail out.

This algorithm will work as long as your last disc node is physically OK.

Once all the cluster nodes are up, you can re-configure them (remember you are not sure what is the clustering metadata of each node). The key is to force_reset every node. This ensures that any trace of previous cluster configuration is erased from all nodes. First do it for one disc node:

Then for every other node (either disc or RAM):

Controlling a Cluster Remotely

You can SSH into every box and perform the above-mentioned steps on each box manually. That works, but it gets old really fast. Also, it is impractical if you want to build and tear down a cluster as part of an automated test.

One solution is to use Fabric. One serious gotcha I ran into is that when I performed the build cluster algorithm manually it worked perfectly, but when I used Fabric it failed mysteriously. After some debugging I noticed that the nodes started successfully, but by the time I tried to stop_app, the nodes were down. This turned out to be a Fabric newbie mistake on my part. When you issue a remote command using Fabric, it starts a new shell on the remote machine. When the command is finished, the shell is closed, sending a SIGHUP (Hang up signal) to all its sub-processes, including the Erlang node. Using nohup takes care of that. Another more robust option is to run RabbitMQ as a service (daemon).

Administering a Cluster Programmatically

Administration means creating virtual hosts, users, exchanges and queues, setting permissions, and binding queues to exchanges. The first thing you should do if you didn’t already is install the management plugins. I’m not sure why you have to enable it yourself. It should be enabled by default.

The web UI is fantastic and you should definitely familiarize yourself with it. However, to administer a cluster remotely there is a RESTful management API you can use. There is also a Python command-line tool called rabbitmqadmin that requires Python 2.6+. Using rabbitmqadmin is pretty simple. The only issue I found is that I could use only the default guest account to administer the cluster. I created another administrator user called ‘admin’, set its permissions to all (configure/read/write) and gave it a tag of “administrator” (additional requirement of the management API), but I kept getting permission errors.

The Elmer project allows you to specify a cluster configuration as a Python data structure (see the sample_config.py) and will set up everything for you.

Take-Home Points

  1. RabbitMQ is cool.
  2. The cluster admin story is not air-tight.
  3. Programmatic administration is key.
  4. Fabric is an awesome tool to remotely control multiple Unix boxes.
Tags:

Comments

Related Articles