Ruby is a one of the most popular languages used on the web. We’ve started a new Session here on Nettuts+ that will introduce you to Ruby, as well as the great frameworks and tools that go along with Ruby development. In this lesson, we’ll look at using regular expression in Ruby.
Prefer a Video Tutorial?
Preface: Regular Expression Syntax
If you’re familiar with regular expressions, you’ll be glad to know that most of the syntax for writing the actual regular expressions is very similar to what you know from PHP, JavaScript, or [your language here].
If you’re not familiar with regular expressions, you’ll want to check out our Regex tutorials here on Nettuts+ to get up to speed.
Regular Expression Matching
Just like everything else in Ruby, regular expressions are regular objects: they’re instances of the Regexp
class. However, you’ll usually create a regular expression with the standard, literal syntax:
/myregex/ /\(\d{3}\) \d{3}-\d{4}/
To start, the simplest way to use a regexp is to apply it to a string and see if there’s a match. Both strings and regexp objects have a match
method that does this:
"(123) 456-7890".match /\(\d{3}\) \d{3}-\d{4}/ /\(\d{3}\) \d{3}-\d{4}/.match "(123) 456-7890"
Both of these examples match, and so we’re going to get a MatchData
instance back (we’ll look at MatchData
objects soon). If there’s no match, match
will return nil
. Because a MatchData
object will evaluate to true
, you can use the match
method in conditional statements (like an if-statement), and just ignore that you’re getting a return value.
There’s another method that you can use to match regexp with strings: that’s the =~
(the equals-tilde operator). Remember that operators are methods in Ruby. Like match
, this method returns nil
on no match. However, if there is a match, it will return the numerical position of the string where the match started. Also like match, both strings and regexps have =~
.
"Ruby For Newbies: Regular Expressions" =~ /New/ # => 9
Regular expressions get more useful when we’re gleaning out some data. This is usually done with groupings: wrapping certain parts of the regular expression in parentheses. Let’s say we want to match a first name, last name, and occupation in a string, where the string is formatted like this:
str1 = "Joe Schmo, Plumber" str2 = "Stephen Harper, Prime Minister"
To get the three fields, we’ll create this regexp:
re = /(\w*)\s(\w*),\s?([\w\s]*)/
This matches any number of word characters, some whitespace, any number of word characters, a comma, some optional whitespace, and any number of word characters or whitespace. As you might guess, the parts including word characters refer to the names or occupation we’re looking for, so they are wrapped in parentheses.
So, let’s execute this:
match1 = str1.match re match2 = str2.match re
MatchData Objects
Now, our match1
and match2
variables hold MatchData
objects (because both our matches were successful). So, let’s see how we can use on of these MatchData
objects.
As we go through this, you’ll notice that there are a few different ways to get the same data out of our MatchData
object. We’ll start with the matched string: If you want to see what the original string that was matched against the regexp, use the string
method. You can also use the []
(square brackets) method, and pass the parameter 0
:
match1.string # => "Joe Schmo, Plumber" match1[0] # (this is the same as match1.[] 0 ) => "Joe Schmo, Plumber"
What about the regular expression itself? You can find that with the regexp
method.
match1.regex # => wsw,s[ws] (this is IRB's unique way of showing regular expressions; it will still work normally)
Now, how about getting those matched groups that were the point of this exercise? Firstly, we can get them with numbered indices on the MatchData
object itself; of course, they are in the order we matched them in:
match1[1] # => "Joe" match1[2] # => "Schmo" match1[3] # => "Plumber" match2[1] # => "Stephen" match2[2] # => "Harper" match2[3] # => "Prime Minister"
There’s actually another way to get these captures: that’s with the array property captures
; since this is an array, it’s zero-based.
match1.captures[0] # => "Joe" match2.captures[2] # => "Prime Minister"
Believe it or not, there’s actually a third way to get your captures. When you execute match
or =~
, Ruby fills in a series of global variables, one for each of the captured groups in your regexp:
"Andrew Burgess".match /(\w*)\s(\w*)/ # returns a MatchData object, but we're ignoring that $1 # => "Andrew" $2 # => "Burgess"
Back to MatchData
objects. If you want to find out the string index of a given capture, pass the captures number to the begin
function (here, you want the capture’s number as you’d use it with the []
method, not via captures
). Alternatively, you can use end
to see when that capture ends.
m = "Nettuts+ is the best".match /(is) (the)/ m[1] # => "is" m.begin 1 # => 8 m[2] # => "end" m.end 2 # => 14
There’s also the pre_match
and post_match
methods, which are pretty neat: this shows you what part of the string came before and after the match, respectively.
# m from above m.pre_match # => "Nettuts+ " m.post_match # => " best"
That pretty much covers the basics of working with regular expressions in Ruby.
Regular Expression Use
Since regular expressions are so useful when manipulating strings, you’ll find several string methods that take advantage of them. The most useful ones are probably the substitution methods. These include
sub
sub!
gsub
gsub!
These are for substitution and global substitution, respectively. The difference is that gsub
replaces all the instances of our pattern, while sub
replaces only the first instance in the string.
Here’s how we use them:
"some string".sub /string/, "message" # => "some message" "The man in the park".gsub /the/, "a" # => "a man in a park"
As you might know, the bang methods (ones ending with an exclamation mark!) are destructive methods: these change the actual string objects, instead of returning now ones. For example:
original = "My name is Andrew." new = original.sub /My name is/, "Hi, I'm" original # => My name is Andrew." new # => "Hi, I'm Andrew" original = "Who are you?" original.sub! /Who are/, "And" original # => "And you?"
Besides these simple examples, you can do more complex things, like this:
"1234567890".sub /(\d{3})(\d{3})(\d{4})/, '(\1) \2-\3' # => "(123) 456-7890"
We don’t get MatchData
objects or the global variables with the substitution methods; however, we can use the “backslash-number” pattern in the replacement string, if we wrap it in single quotes. If you want to further manipulate the captured string, you can pass a block instead of the second parameter:
"WHAT'S GOING ON?".gsub(/\S*/) {|s| s.downcase } # => "what's going on?"
There are many other functions that use regular expressions; if you’re interested, you should check out String#scan
and String#split
, for starters.
Conclusion
We’ll that’s regular expressions in Ruby for you. If you have any questions, let’s hear them in the comments.
Comments