An Introduction to WebDriver Using the JavaScript Bindings

In this tutorial, we'll take a look at WebDriverJs which is a tool used for browser automation. Chrome will be used throughout however modern browsers tend to have available drivers for use with WebDriver (even mobile) so do check them out if you wish to automate other browsers.

While unit tests are certainly valuable for modern web applications, at some point, as your application grows, you'll find bugs crop up which weren't caught by a unit test but would have theoretically been caught by an integration/acceptance test.

Should you wish to follow a testing strategy which involves browser testing, this guide will give you an initial introduction to testing with WebDriverJs so you're equipped with enough knowledge to get started.

This tutorial assumes you're familiar with JavaScript and can run JavaScript code using node.js.

WebDriverJS

If you'd like to follow along, feel free to checkout this sample project which contains a few WebDriver examples for you to run. You'll also need to install Chromedriver and have it available in your path.

Selenium WebDriver typically has a server and a client. Apart from WebDriver contributors, most people will only be interested in the client API which allows them to control a browser through their script. To get started, install the JavaScript bindings for WebDriver:

npm install selenium-webdriver

Once you've installed that module via NPM, you can require the module in your node code like this:

require('selenium-webdriver');

Alternatively, if you check out the sample project, you can simply run an npm install inside the folder as the WebDriverJs module is listed as a dependancy in the package.json file.

While you can browse the official documentation, my personal favourite is the source itself. This webdriver.js file lists many WebDriver methods, e.g. you'll notice a getAttribute and a getText. Here are some methods which may be of interest:

get - Navigate the browser to a URL.
findElements - Similar to document.querySelectorAll in the browser.
executeScript - Execute raw JavaScript onto the current page.
getText - Get the text content of an element including its children.
isDisplayed - Find out if an element is displayed on the page.

Promises

One factor about the JavaScript bindings for WebDriver in particular is that almost every method is asynchronous. It means the following code doesn't actually get the title of the web page:

var title = browser.getTitle();
//logs { then: [Function: then],  cancel: [Function: cancel], isPending: [Function: isPending] }
console.log(title);

Instead what you need to do, is this:

var promise = browser.getTitle();

promise.then(function(title) {
    console.log(title);
});

This is because WebDriverJs uses promises in order to make dealing with async code a bit more pleasant. Note that the promise implementation as part of WebDriverJs does not conform exactly to the Promises/A+ standard.

The key thing to take away here is that most WebDriver methods will return a then method which accepts two optional (function) arguments. The first argument is a callback which may receive a value.

In the above example, we asked for a title, therefore our callback will receive that title as its first argument. The second optional function argument we can pass to the then method allows us to catch errors, if at all.

Examples

Let's recap on where we are so far:

We installed the Chromedriver binary.
We installed WebDriverJs via NPM.
With the understanding that almost everything is async, we know how to use promises to retrieve the values we want.

Have a look at this code example:

var webdriver = require('selenium-webdriver');
var browser = new webdriver.Builder().usingServer().withCapabilities({'browserName': 'chrome' }).build();

browser.get('http://en.wikipedia.org/wiki/Wiki');
browser.findElements(webdriver.By.css('[href^="/wiki/"]')).then(function(links){
    console.log('Found', links.length, 'Wiki links.' )
    browser.quit();
});

Run the Wiki example like this:

$ node Wiki.js
Found 367 Wiki links.

In the code example, the first few lines are essentially boilerplate. It initialises the browser object and specifies some initial configuration, like what browser to actually use. Starting with the call to browser.get, we have the code we really care about.

First we navigate to a Wikipedia page.
We construct a CSS selector which matches elements that have an attribute of href and a value starting with /wiki/ (e.g. internal Wiki links).
Still on the same line as step #2, we pass the CSS selector into the findElements method which will go ahead and asynchronously evaluate the selector expression.
To observe updates to the Promise, we pass a callback function to the then method.
The first argument to the callback is an array of matched elements, so we retrieve that and log the length.
Finally, we quit the browser.

Finding elements on the page is one piece of the puzzle, let's take a look at another example which demonstrates carrying out a Google search and clicking on the result we expect to be on the page.

/*
* Carry out a Google Search
*/

"use strict";

var webdriver = require('selenium-webdriver');
var browser = new webdriver.Builder().usingServer().withCapabilities({'browserName': 'chrome' }).build();

function logTitle() {
    browser.getTitle().then(function(title) {
        console.log('Current Page Title: ' + title);
    });
}

function clickLink(link) {
    link.click();
}

function handleFailure(err) {
    console.error('Something went wrong\n', err.stack, '\n');
    closeBrowser();
}

function findTutsPlusLink() {
    return browser.findElements(webdriver.By.css('[href="http://code.tutsplus.com/"]')).then(function(result) {
        return result[0];
    });
}

function closeBrowser() {
    browser.quit();
}

browser.get('https://www.google.com');
browser.findElement(webdriver.By.name('q')).sendKeys('tuts+ code');
browser.findElement(webdriver.By.name('btnG')).click();
browser.wait(findTutsPlusLink, 2000).then(clickLink).then(logTitle).then(closeBrowser, handleFailure);

Running the above code:

$ node GoogleSearch.js
Current Page Title: Tuts+ Code Tutorials

A few interesting snippets are shown here. First, we can get a feel for what it's like to use function declarations - instead of anonymous function callbacks (that are passed to then), the result is something like a fluent API (see the last line). Also, since we have the ability to create custom promises (deferreds), we can be as fluent as we desire!

Note that we attach an error callback in the last call to then, even if an error occurs earlier on, it'll still propagate up.

We navigate to the Google home page and search for 'tuts+ code'. Since we're operating on the browser object, WebDriver's internal Control Flow mechanism knows to schedule each command to happen one after the other, this saves us the hassle of having to chain everything together and also explains why there are two calls to findElement, one after the other, without having to be chained to each other.

Waiting

When we carry out the Google search from the homepage, no page reload occurs, therefore WebDriver will immediately try and find the elements we've instructed it to on the search results page. Knowing when to wait for elements is a critical part of browser automation.

The old and naughty way of doing things was to use a sleep. Since the point at which an element appears can heavily depend on external factors (e.g. network connection speed), developers can sometimes instruct WebDriver to wait for a fixed period of time before continuing. This, of course, is riddled with problems.

Fortunately, the wait method makes automating modern web pages a lot nicer. You call wait with two arguments, the first is a function which needs to evaluate to true by a time period defined as the second argument to wait. WebDriver regularly calls your callback until either it returns true, or time has run out in which case an error is thrown.

Modifying the Browser

While there are many methods to call on the context of DOM elements, you can also call methods on the browser itself to give you more control over the browser state. Here are a few simple examples to give you a better idea:

Set the dimensions of the browser window

browser.manage().window().setSize(1280, 720)

Connect the browser to a proxy:

var proxy = require('selenium-webdriver/proxy');
browser = new webdriver.Builder()
.usingServer()
.withCapabilities({'browserName': 'chrome' })
.setProxy(proxy.manual({
    http: '127.0.0.1:9000'
}))
.build();

You can also read, write, and delete cookies, take a screenshot of the window, set some individual browser settings, and more.

Alternative Options

There are a number of options available when wanting to control a browser programatically. First of all, we took a look at the JavaScript bindings for WebDriver however there are some others:

WebDriverJs, e.g. the version we installed using npm install selenium-webdriver is just one version of a WebDriver Client API written in JavaScript. If you're keen to programatically control browsers via JavaScript, there are also other options:

WD.js - Fluent API using promises + chaining.
Leadfoot - Now used by the latest version of Intern.
WebDriver.io - Has a bunch of documentation for use with BDD/TDD frameworks.
Testium - Has clear documentation on exactly what is supported.
DalekJS - A fun looking website with pretty feedback when executing tests. A lot of DalekJS has been split out into modules which can be found on Github.
Nightwatch - Another tool with pretty looking feedback and a fluent API.
Webdriver-sync - Synchronous version of interacting with WebDriver.

Using something like WD.js or Nightwatch can mean a number of things:

Different API to interact with. If the official JavaScript selenium-webdriver bindings has an API you're not used to, check out the alternative options above.
Alternative feedback - this can be on the reporter level, but also simply what you see in the terminal after a test has failed locally.

Conclusion

If you wish to start using WebDriver for the sake of testing, then that's great. You should also keep in mind that browser automation doesn't have to stop at testing, how about just automating a repetitive task?

For example, check out this article on Getting to Philosophy, it essentially explains how continuously clicking on the first link in Wiki articles will eventually land you on the Philosophy article!

This makes for a fun task to automate! Checkout out this animated gif or the source to see it in action.

HIGHLIGHTS OF THE DAY