Rake 201

In this second article about Rake, we dive a little deeper and cover slightly advanced topics like file tasks, rules, multitasks and more that will improve your Rake chops significantly.

Topics

  • Disclaimer
  • Default Task
  • The Task Object
  • File Tasks
  • Directory Method
  • File Utilities
  • Rules
  • Trace Flag
  • Parallel Tasks

Disclaimer

I want to approach this topic from a more general point of view. This is not an article that shows you a list of Rake tasks with clever solutions that are ready to be copied and pasted without much afterthought. It’s more intended to be a look under the hood while being newbie-friendly and also interesting to people who just haven’t yet played much with Rake besides the obvious Rake tasks in Rails. 

It’s the understanding of the tool and what it offers that yields a higher return, I think. I hope you don’t mind. To me, covering the principles is more valuable—and more approachable to beginners—and it’s up to you what you do with it in your own applications.

Default Task

I’m sure you have at least heard the term previously somewhere. But what is a default task really? It’s nothing magic, but let’s get this one out of the way quickly. When you run rake without any additional name for a rake task, the default task gets executed.

Shell

Some Rakefile

In Rails, the default task is supposed to run your tests. Your guess is as good as mine, but I suppose it was a result of tests needing to be run more often than any other task. When you redefine the default Rake task in Rails, it just adds up to the task defined by Rails—it won’t redefine it. It’s actually how Rake works. When you redefine a Rake task, you add up to the previous definitions.

File Tasks

As the name might suggest, they are tasks that you execute on files (and directories). They have a few tricks up their sleeves, though. Rake will, of course, work a lot of the time with files. No surprise that somebody recognized that pattern and created specialized file tasks for your convenience—especially for the simple reason to avoid duplication or wasting processing capabilities.

Transforming files from one type to another is a very common task. The sources are your dependencies and the task names are what follows the file keyword. Converting Markdown to HTML files, converting HTML files into ebook formats, JPG images to PNG images, compiling source code, building static pages or just changing file extensions and many more options are at your disposal. We could do all of this manually, but this is tedious and ineffective, of course. Writing code for this is much more elegant and scalable.

Using file tasks is not much different from “regular” tasks. They will also show up if you ask for a list of Rake tasks via rake -T. In fact, Rake treats all tasks equally—except multitask a bit. Adding descriptions and prerequisites is no problem for file tasks to handle as well. 

Actually, prerequisites are a necessity for mentioning source files before they get processed. We need the source to exist for this to work—which makes sense as a dependency, of course. Without it, Rake wouldn’t know how to continue—it can’t create the new file out of thin air, after all.

Some Rakefile

The name for your file task is basically your target file, the file that you want to have created. The prerequisite is the source file that is needed for the task. Inside the block, you are telling Rake how to create the desired output—how to build it using the prerequisite file(s) that already exist. Input-output. For example, this could be a shell command using the pandoc tool that transforms Markdown files into HTML files. The applications for file tasks are more than plenty. The syntax, though, might feel a little weird at first. I get it.

Rake first checks if the target file exists and, if so, checks if the timestamp is older than the prerequisite files—a time-based dependency. Rake will run the file task if the timestamp is older than the prerequisites or if the file does not exist yet. That is very handy if you need to handle more than a couple of files—which is especially cool because you won’t need to rebuild a ton of files just because you changed a single one in a collection, for example. In contrast to that, regular Rake tasks are always run—they don’t check any timestamps or other changes, unless you make them so, of course.

File Utilities

Some Rakefile

Shell

In case you are wondering about the cp method in the previous example or the above mv command, let’s talk about file utilities. We could have used sh mv ... to execute a Shell command from within a Rake task. Luckily for us, we can use a module that makes Shell command stuff like this a lot less verbose and platform independent. FileUtils is a Ruby module with lots of unixy commands for file operations: 

  • rm 
  • cp 
  • mv
  • mkdir
  • and so on…

If reinventing the wheel is not your thing, FileUtils will be a useful companion dealing with files. Often Rake is all you need, but every once in a while, you’ll be really glad this handy module has got your back. RakeUtils extended this module slightly for your convenience. 

Let’s have a look at a list of what’s at your disposal and then zoom in on a few particular ones that might be of interest to you:

Although I assume that you are a newbie, I also assume that you have played with Rails before and that you know the very basic Unix utilities—stuff like mv, cd, pwd, mkdir and stuff. If not, do your homework and come back. 

In your Rakefiles, you can use these methods right out of the box. And to avoid misunderstandings, this is a Ruby layer that ‘imitates’ these Unix commands and which you can use in your Rakefiles without any prefixes like sh—for executing a Shell command. By the way, the options you see in the list above mean a hash {} of options. Let’s look at a few interesting commands that might come in handy writing file tasks:

  • sh

This lets you execute shell commands from within your Ruby files.

  • cd

This is a very basic one, but there is something cool about this command. If you provide cd with a block, it changes the current directory to its destination, does its business as defined in the block, and then returns to the previous working directory to continue. Neat, actually!

  • cp_r

Lets you copy files and directories recursively in bulk.

  • mkdir_p

Creates a target directory and all its specified parents. Luckily for us, we have the directory method in Rake, which is even more convenient, and therefore we don’t need it.

  • touch

This updates the timestamp of a file if it exists—if not, it gets created.

  • identical?

Lets you check if two files are the same.

Directory Method

In Rake, you have a handy way to define directories without using mkdir or mkdir_p. It’s especially handy when you need to build up nested directories. A folder tree can be a pain if you need to build up a directory structure via multiple file tasks that have tons of prerequisites for the directory structure. Think of the directory method as a folder task.

Some Rakefile

This creates the directories in question without much fuss. What might not be obvious right away is the fact that you can depend on it like any other rake task—as a prerequisite. Just make sure the name of the file task, its name, includes the directory that you depend on. If multiple tasks depend on it, it will still get created only once.

As you can see here, Rake is very consistent and thinks about all the things to build as tasks. Thanks, Jim, that makes life easy!

Rules

Rules can help us to reduce duplications when we deal with tasks—file tasks, actually. Instead of instructing Rake to execute tasks on particular files like somefile.markdown, we can teach Rake to execute these tasks on a certain kind of file—like a pattern or blueprint. Transforming a set of files instead of single ones is a much more versatile and DRY approach. Tasks like these scale much better when we define a pattern for files that share similar characteristics.

Some Rakefile

As you can see, having a bunch of files would be tedious to maintain that way. Yes, we can write our own script where we keep a list of files in an array and iterate over it, but we can do better—much better. 

Another unwanted side effect would be that every time we run such a script, all HTML files get rebuilt—even if they haven’t changed at all. A large list of files would make you wait a lot longer or take up a lot more resources than necessary. We don’t need any extra code to take care of multiple files. Rake does a better, more efficient job in that department since it only executes its file tasks or rules when the file was touched in the meantime.

Some Rakefile

When we define a rule like the above, we have a mechanism in place for transforming any file with a .markdown extension into an .html file. With rules, Rake is first looking for a task for a specific file like quartermaster_gadgets.html. But when it can’t find one, it is using the plain .html rule to look for a source that could achieve successful execution. That way you don’t have to create a long list of files but only use a general “rule” that defines how to handle certain file tasks. Pretty awesome!

Task Object

In the rule above, we were making use of the task object—in this case a rule object, to be even more precise. We can pass it as a block argument into the closure and call methods on it. As with file tasks, rules are all about task sources, its dependencies—a markdown file, for example—and its task name. 

From within rules’ body in the block (and file tasks), we have access to the rules’ name and source. We can extract information from that argument passed—the name through rule.name and its source (aka file source) via rule.source. Above, we could avoid duplicating the names of the files and generalize a pattern instead. Similarly, we could get the list of prerequisites or dependencies with rules.prerequisites. For file tasks or any other task, the same applies, of course.

Talking of dependencies, they can function like a list to be iterated. There's no need to create a separate each loop if you play your cards right. 

As you can see, we didn’t need to manually iterate over the list of articles. We simply put Rake to work and used the dependencies—which is a lot simpler and cleaner.

What is even cooler for DRYing stuff up is that rules can take a proc object—an anonymous function object, a lambda basically—as a prerequisite. That means that instead of only a single pattern as a prerequisite, we can pass something more dynamic that lets us cast a net of patterns that catches more than a single fish. For example, rules for .markdown and .md files. 

They would have the same body of the rule but only a different pattern as prerequisite. It’s like defining a new File task for every object returned by the proc object. Another way to work with rules is regular expressions, of course. You pass a pattern as a dependency and if you got a match, the file task can be executed. Sweet options, no?

The Difference Between Procs and Lambdas

If you are new to lambda land or haven’t figured it out completely yet, here is a small refresher. Procs are objects you can pass around that can be executed later—so are lambdas. Both are Proc objects, by the way. The difference is subtle and comes down to the arguments that are passed into them. Lambdas check the number of arguments and can blow up with an ArgumentError for that reason—procs don’t care. The other difference is in regards to their handling of return statements. Procs get out of the scope where the proc object was being executed. Lambdas just exit the lambda scope and continue to trigger the next code that is in line, so to speak. Not super important here, but I thought for the newbies among you, it can’t hurt either.

Useful Flags

This is a short list of flags that you can pass to rake tasks.

  • --rules

Shows you how Rake tries to apply rules—a trace for rules. Invaluable if you deal with a couple of rules and run into bugs.

Shell

  • -t

Remember the solve_bonnie_situation task from article one? Let’s add this flag to this Rake task and turn on tracing. We also get a backtrace if we run into errors. This is certainly handy for debugging.

Shell

Tracing Rules

Setting Rake.application.options.trace_rules = true in a Rakefile itself tells Rake to show us trace information about rules when we run a task. This is cool because when we run a trace via rake -t, with a single flag, we get all the debugging information we need. We not only get a list of task invocations but can also see which rules were applied—or attempted.

  • -P

Shows a list of prerequisites for all tasks. Here we use again the solve_bonnie_situation task. Omitting output for other tasks, this would be its singled out output:

Shell

If you are curious, run rake -P. Pretty interesting output.

  • -m

Runs tasks as multitasks.

Parallel Tasks

Let me introduce you to the multitask method. This can help you to speed things up a bit—after all, we have multiple cores on most modern computers, so let’s make use of them. Of course, you can always achieve the speed boosts by writing solid code that lacks any fat, but running tasks in parallel can certainly give you something extra in that regard. There are pitfalls, though, which we will also cover.

The tasks we executed so far run all tasks in sequence, one after the other. It’s a safe bet if your code is in order, but it is also slower. If speed is important for some task, we can help a little by multi-threading tasks. Keep in mind, though, that the sequential approach is the better option under some circumstances.

Let’s say we have three Rake tasks that need to run as a prerequisite in order to execute a fourth one. These are four threads, basically. In the bigger picture of things, when you run multiple applications—or to be more specific, processes—at once, the same idea is at work.

Using multitask, the dependencies in our prerequisite array are now not executed in this order anymore. Instead, they are spread and run in parallel—but before the shoot_bond_movie task, of course. A Ruby thread for each task will be run at the same time. Once they are finished, shoot_bond_movie will do its business. The way tasks are acting here is similar to randomization, but in fact they are simply executed at the same time.

The tricky part is just to make sure that certain dependencies are processed in an order that suits your needs. Because of that, we need to take care of race conditions. This basically means that some task runs into problems because the order of execution had unintended side effects. It’s a bug. 

If we can avoid that, we achieve thread safety. In regards to common prerequisites, interestingly, these prerequisites will be run only once because the multitask prerequisites wait for their completion first.

Both shoot_car_chase and shoot_final_confrontation tasks depend on prepare_italy_set to finish first—which is only run once, by the way. We can use that mechanism to predict the order when running tasks in parallel. Don’t just trust the order of execution if it’s somehow important to your task.

Final Thoughts

Well, I guess you are now fully equipped to write some serious Rake business. Making proper use of this tool will hopefully make your life as a Ruby developer even more joyful. In this second article, I hope I could convey what a simple but marvelous tool Rake really is. It was created by a true master of his craft.  

We all owe Jim Weirich tremendous respect for coming up with this elegant build tool. The Ruby community is certainly not quite the same since he passed away. Jim’s legacy is clearly here to stay, though. Another giant we are privileged to build upon.

Tags:

Comments

Related Articles