Say that someone is familiar with British spelling and has decided to complete his degree in the US. He is asked to write a paper about Python for the class. He is well versed in Python and has no issue in writing the paper. He was talking about images in a part of his paper and wrote more than once the word grey
(British spelling) instead of gray
(US spelling), in addition to neighbourhood
(British spelling) instead of neighborhood
(US spelling). But he is now in the US and has to go through all the words spelled the British way and replace them with the US spellings.
This is one of many scenarios in which we need to change some spelling or mistake in multiple locations.
In this quick tip, I will show you an example where we have five text files that have misspelled my name. That is, instead of writing Abder
, Adber
is written. The example will show you how we can use Python to correct the spelling of my name in all the text files included within a directory.
Let's get started!
Data Preparation
Before we move forward with the example, let's prepare the data (text files) we want to work with. Go ahead and download the directory with its files. Unzip the directory and you are now all set.
As you can see, we have a directory named Abder
which contains five different files named 1,2,3,4, and 5
.
Implementation
Let's get to the fun part. The first thing we need to do is read the content of the directory Abder
. For this, we can use the listdir()
method, as follows:
import os directory = os.listdir('/Users/DrAbder/Desktop/Abder')
If we try to see what's inside the directory, we can do the following:
print directory
In which case, we will get:
['.DS_Store', '1.rtf', '2.rtf', '3.rtf', '4.rtf', '5.rtf']
This shows that we have five rft
files inside the directory.
To make sure we are working with the current directory (directory of interest), we can use chdir
as follows:
os.chdir('/Users/DrAbder/Desktop/Abder')
The next thing we need to do is loop through all the files in the directory Abder
. We can use a for-loop
as follows:
for file in directory:
Since we want to look in each of the five files in the directory and look for Adber
, the normal thing to do at this stage is to open and read the content of each file:
open_file = open(file,'r') read_file = open_file.read()
Now comes a vital step, especially when talking about pattern matching, in our case, searching for Adber
. This step is the use of regular expressions. In Python, in order to use regular expressions, we will be using the re module.
We will be using two main functions from this module. The first is compile():
Compile a regular expression pattern into a regular expression object, which can be used for matching using itsmatch()
andsearch()
methods.
And the second is sub(), for substituting the wrong spelling with the correct one. We will thus do the following:
regex = re.compile('Adber') read_file = regex.sub('Abder', read_file)
Finally, we want to write the new text after substitution to our files, as follows:
write_file = open(file,'w') write_file.write(read_file)
Putting It All Together
In this section, let's see how the whole Python script, which will look for Adber
in each file and replace that with Abder
, will look:
import os, re directory = os.listdir('/Users/DrAbder/Desktop/Abder') os.chdir('/Users/DrAbder/Desktop/Abder') for file in directory: open_file = open(file,'r') read_file = open_file.read() regex = re.compile('Adber') read_file = regex.sub('Abder', read_file) write_file = open(file,'w') write_file.write(read_file)
As we can see, Python makes it very easy to carry out modifications across multiple files using the for-loop
. Another important part to remember here is the use of regular expressions for pattern matching.
If you want to know more about Python's loops, check A Smooth Refresher on Python's Loops. And, for more information about regular expressions, check Regular Expressions in Python.
Comments