Data Visualization App Using GAE Python, D3.js and Google BigQuery

Imagine that you have a large set of data with millions of rows and you're faced with the task of extracting information from the data. How do you make sense of a large set of data? The simple answer to this question is that we do so via data visualization, but what is that, exactly?

Data visualization is the pictorial representation of data to figure out what is hidden inside the information. In this three-part series, we'll create a visualization app to extract information and make sense of a large set of data.

Querying large chunks of data without proper hardware can result in performance issues. Google provides a solution to this problem in the form of BigQuery. Backed by Google's infrastructure, it enables you to query large chunks of data. We'll be using a dataset from Google BigQuery for our visualization application.

In this tutorial, I'll take you through the process of creating a visualization application using Python, Google BigQuery, and the D3.js JavaScript library. We'll be using Google App Engine (or GAE) to host our application. This tutorial assumes that you have a basic knowledge of Python and JavaScript. For creating graphs, we'll be using D3.js.

Note also that the code from the above tutorial is available on GitHub, and a demo is hosted on AppSpot.

Getting Started With Google App Engine 

Step 1: Downloading and Using The GAE Python SDK 

Let's get started by setting up our development environment. Download the Google App Engine Python SDK and extract it. We'll be using the webapp2 framework for Python. 

From Google's documentation:

A webapp2 application has two parts: 
1) one or more RequestHandler classes that process requests and build responses 
2) a WSGIApplication instance that routes incoming requests to handlers based on the URL.

The aforementioned points will become more clear when we create our request handler class. Create a directory called PythonD3jsMashup which will be our project directory. Navigate to the project directory and create a file called app.py. This will be our application module. Let's create a request handler class to process the request and build a response, and a WSGIApplication instance to route requests. 

 Here is how app.py should look:

When a / request occurs, it's routed to the ShowHome class which renders the home page. Let's write the Python code to display our homepage. First, create a new folder called Templates in the PythonD3jsMashup folder. Inside it, create an HTML page called index.html

It should contain the following code:

In order to render templates, we'll need to import the template library. After doing so, we'll use it to render the index.html page as shown below:

Next, navigate to the project directory PythonD3jsMashup and create a file called app.yaml. This file will act as a configuration file for our application. It should look like this:

As you can see, we have defined a number of things in the app.yaml. Most of the parameters are self-explanatory. The handlers directive dictates that every URL request that matches /.* be handled by the application object in the app module.

To run the application locally, navigate to the GAE SDK and run the following command:

Point your browser to http://localhost:8080 and you should see this page:

Deploying an App to AppSpot 

Next, we'll be deploying our app onto the Google App Engine. In order to deploy the app, first log in to the GAE console:

From the dashboard click on the Create Application button.

Next, you'll see a screen similar to the one shown below. Type an Application Identifier and Application Title for the app and click Create Application.

Once the application is registered successfully you'll be greeted with a success message:

Make a note of the red circled text which will be used as application identifier later.

Click on the dashboard link. From the left menu in the Dashboard, under the Administration tab, click on the Application Settings. You'll see the following screen:

Make a note of the Service Account Name. We'll need this in the Google developer console.

Next, modify the application name to pythond3jsmashup in app.yaml.

Now to deploy the app to GAE, navigate to GAE Python SDK and type the following command:

Once deployed, try browsing the AppSpot URL.

Getting Started With Google BigQuery

To get started with Google BigQuery, log in to the Google API console. You should see a dashboard listing all the available projects in your Google account. Click on the project name PythonD3jsMashup.

 

Make a note of the Project Number, which we'll use while connecting with Google BigQuery.

From the left side menu, click on the APIs & auth tab. From the listed items click on APIs. Now you should see a list of available APIs:

BigQuery API is turned OFF. Click on the OFF status to turn it ON.

Click on the project name pythond3jsmashup on the left side menu and click Permissions. Click Add Member and add the Service Account Name (which we noted down earlier) with "Can edit" permissions as shown below:

Collecting Data From Google BigQuery

Until now, we created our simple app and deployed it on GAE with BigQuery API enabled. Next, we'll be connecting to one of the freely available datasets on BigQuery. 

Let's start by creating a new page called  Templates/chart.html where we'll show the visualization. Create a method to route the Click Here link on our home page to chart.html

Here is what it looks like:

Also modify the request handler to route the request /chart to ShowChartPage as shown below:

Next, modify the href attribute of the Click here link as shown:

Now try running the app and on clicking Click here, and you should be redirected to chart.html.

Next, we'll try to authenticate our app with Google BigQuery and fetch some data. For authentication, we'll require Google API Python client. Download google-api-python-client-gae-1.2.zip and extract it into the project folder.

Include the following libraries in app.py.

First, to query a dataset from BigQuery we need to build a BigQuery service as shown: 

Second, we need to create a query that we'll be executing against the BigQuery dataset. We'll be using the Shakespeare dataset. Here is the query that we'll be executing against the dataset:

To query the data we'll create a job from the bigquery_service:

And then query from the created job:

Replace PROJECT_NUMBER with the project number we had noted down earlier. So, here is the modified app.py:

The response from the query would be printed out on chart.html. Save all your code and upload the modified code to GAE using appcfg.py update PythonD3jsMashup/.

Now if you point your browser to http://pythond3jsmashup.appspot.com/chart you'll see the json response returned from querying the dataset.

Conclusion

In this tutorial, we created a Python application and deployed it on Google App Engine. We connected our application with Google BigQuery and fetched data from the freely available dataset. 

Next, we'll try to parse the data fetched from Google BigQuery and visualize it using JavaScript library D3.js. In the meantime, please don't hesitate to leave any questions or comments in the feed below.

Tags:

Comments

Related Articles