In this tutorial series, you'll see how to build a code review scheduler using Python. Throughout the course of this series, you'll brush up against some basic concepts like reading emails, sending an email, executing terminal commands from Python program, processing git logs, etc.
In the first part, you'll start by setting up the basic configuration files, reading git logs, and processing them for sending the code review request.
Getting Started
Start by creating a project folder called CodeReviewer
. Inside the CodeReviewer
folder, create a file called scheduler.py
.
Assuming the code review scheduler will be run against multiple projects, you'll need to specify the project name against which the scheduler will run and the number of days for which the log needs to processed. So first read these two parameters as arguments from the code review program.
Let's make use of the argparse
Python module for reading the program parameters. Import the library and add the program arguments. You can use the ArgumentParser
method of the argparse
module to initiate the parser. Once it's initiated, you can add the arguments to the parser. Here is the code for reading the arguments from the program:
import argparse parser = argparse.ArgumentParser(description="Code Review Scheduler Program") parser.add_argument("-n", nargs="?", type=int, default=1, help="Number of (d)ays to look for log. ") parser.add_argument("-p", nargs="?", type=str, default="em", help="Project name.") args = parser.parse_args() no_days = args.n project = args.p print 'Processing the scheduler against project ' + project + '....'
Setting Up Project Configurations
Let's maintain a separate config file that will be processed by the code reviewer. Create a file called config.json
inside the project directory CodeReviewer
. Inside the config file, there will be information about each project that will be processed. Here is how the project config file would look:
[{ "name": "project_x", "git_url": "https://github.com/royagasthyan/project_x" }, { "name": "project_y", "git_url": "https://github.com/royagasthyan/project_y" }]
A few more options would be added to the project configurations in the later parts.
Let's read the configuration JSON
file into the Python program. Import the JSON
module and load the JSON
data read from the config file.
# # Read the scheduler config file # with open('config.json') as cfg_file: main_config = json.load(cfg_file)
Read Commit Info From the Repository
When the reviewer script is run, the project name is specified as a parameter. Based on the project name specified, check if its configurations are available and clone the repository.
First, you need to find the project URL from the configurations. Iterate the project's data and find the project URL as shown:
for p in main_config: if p['name'] == project: project_url = p['git_url'] break
Once you have the project URL, check if the project is already cloned. If not, clone the project URL. If it already exists, navigate to the existing project directory and pull the latest changes.
# Clone the repository if not already exists print "********* Doing project checkout **********" if(os.path.isdir("./" + project)): execute_cmd("cd " + project + "; git pull") else: execute_cmd("git clone " + project_url + " " + project) print "*** Done *******" print " "
To execute system commands, you'll be making use of the Python os
module. Create a method to execute system commands since you'll be using it frequently. Here is the execute_cmd
method:
def execute_cmd(cmd): print "***** Executing command '"+ cmd + "'" response = os.popen(cmd).read() return response
Processing the Git Log
After fetching the commit log from the Git repository, you'll analyze the log. Create a new Python method called process_commits
to process the Git logs.
def process_commits(): # code would be here
Git provides us with the commands to get the commit log. To get all logs from a repository, the command would be:
git log --all
The response would be:
commit 04d11e21fb625215c5e672a93d955f4a176e16e4 Author: royagasthyan <[email protected]> Date: Wed Feb 8 21:41:20 2017 +0530 Create README.md
You can also get logs specific to the number of days from the time the command is executed. To get logs since n number of days, the command would be:
git log --all --since=n.days
You can narrow it down further to see whether a particular commit was an addition, modification, or deletion. Execute the above command with --name-status
:
git log --all --since=10.days --name-status
The above command would have the following output:
commit 04d11e21fb625215c5e672a93d955f4a176e16e4 Author: royagasthyan <[email protected]> Date: Wed Feb 8 21:41:20 2017 +0530 Create README.md A README.md
The A
letter on the left side of the README.md
file indicates addition. Similarly, M
would indicate modification and D
would indicate deletion.
Inside the process_commits
method, let's define the Git command to be executed to get the log history.
cmd = "cd " + project + "; git log --all --since=" + str(no_days) + ".day --name-status"
Pass the above command cmd
to the execute_cmd
method.
response = execute_cmd(cmd)
Read the response, iterate each line, and print the same.
def process_commits(): cmd = "cd " + project + "; git log --all --since=" + str(no_days) + ".day --name-status" response = execute_cmd(cmd) for line in response.splitlines(): print line
Make a call to the process_commits
method after the configurations have been read.
print 'Processing the scheduler against project ' + project + '....' process_commits()
Save the above changes and try to execute the code reviewer using the following command:
python scheduler.py -n 10 -p "project_x"
As you can see, we have started the code reviewer with the number of days and the project name to process. You should be able to see the following output:
********* Doing project checkout ********** ***** Executing command 'cd project_x; git pull' *** Done ******* Processing the scheduler against project project_x.... ***** Executing command 'cd project_x; git log --all --since=10.day --name-status' commit 04d11e21fb625215c5e672a93d955f4a176e16e4 Author: royagasthyan <[email protected]> Date: Wed Feb 8 21:41:20 2017 +0530 Create README.md A README.md
So when you execute the code reviewer, you can see that the repository is created if it doesn't already exist, or else it is updated. After that, based on the number of days provided, it fetches the commit log history to process.
Now let's analyze the commit log to find out the commit Id, commit date, and commit author.
As seen in the logs, the commit id starts with the keyword commit
, author starts with the keyword Author:
, and date starts with the keyword Date:
. You'll be using the following keywords to identify the commit Id, author and date for a commit.
Let's try to get the commit Id from the Git log lines. This is quite straightforward. You only need to check if the line starts with the keyword commit
.
for line in response.splitlines(): if line.startswith('commit '): print line[7:]
Save the changes and execute the scheduler and you should be able to get the commit Id.
The next task is to extract the author name. To check if the line contains the author info, you'll first check if the line starts with the Author
keyword. If it does, you'll make use of a regular expression to get the user.
As you can see, the user email address is inside the "less than greater than" signs. We'll use a regular expression to read the email address between < >
. The regular expression will be like this:
'\<(.*?)\>'
Import the Python re
module to use regular expressions in Python.
import re
Now check if the line starts with the Author
keyword. If it does, extract the user email address using the regular expression above. Here is how it would look:
if line.startswith('Author:'): if(re.search('\<(.*?)\>',line)): print re.search('\<(.*?)\>',line).group(1)
To extract the commit date from the log, you need to check if the line starts with the Date
keyword. Here is how it would look:
if line.startswith('Date:'): print line[5:]
Here is the final process_commits
method:
def process_commits(): cmd = "cd " + project + "; git log --all --since=" + str(no_days) + ".day --name-status" response = execute_cmd(cmd) for line in response.splitlines(): if line.startswith('commit '): print line[7:] if line.startswith('Author:'): if(re.search('\<(.*?)\>',line)): print re.search('\<(.*?)\>',line).group(1) if line.startswith('Date:'): print line[5:]
Save the above changes and start the code reviewer.
python scheduler.py -n 10 -p "project_x"
You should have each commit detail with the commit Id, Author and commit date printed on the terminal.
Wrapping It Up
In this first part of the Python Code Review Scheduler, you saw how to set up the project. You read the input parameters required by the scheduler to process the project. In the next part of this tutorial series, we'll collect the commit details from the process_commits
method and send the commit to random developers for code review.
Don’t hesitate to see what we have available for sale and for study on Envato Market, and don't hesitate to ask any questions and provide your valuable feedback using the feed below.
I hope you enjoyed the first part. Do let us know your thoughts or any suggestions in the comments below.
Source code from this tutorial is available on GitHub.
Comments