Building a Simple CSS Selector Engine

Recently, I thought I’d try my hand at creating a simple CSS selector engine. You know what I’m talking about: it’s the kind of thing you use in most JavaScript libraries, where you pass in a CSS selector to find the elements you want to work with. Today, I’ll show you how to create a really simple one!

Disclaimer

What we are going to be building today is in no way something you’d use on a live site. It will not have all the great features you’re used to using in the “real” selector engines, but it will give you a small taste of how complicated something like Sizzle or NWMatcher can be. And who knows? Maybe you’ll pick up a few JavaScript tips and tricks on the way.

Full Screencast

Step 0: Starting our Engine

If we just dove into the code right away, you would probably have drowned before we finished half a dozen lines—you wouldn’t know where anything was going. So let’s start by mapping out the details of our selector engine.

Like most selector engines, it will need a cool name. I’ve chosen the name Sylectra, but feel free to flex your own creativity.
The user will (obviously) pass a CSS selector into our function. The first step will be to split that string into an array, where each item in the array is an element in the selector hierarchy. So, if the selector is #main li.selected a, we should end up with [ "#main", "li.selected", "a"].
Then, we’ll loop over that array, finding the correct element(s) for each item in the array. Once we find them, we’ll use the result as the highest-level parent element(s) for the next element in the array. Continuing with our previous example, we’d start by finding the #main in the document; then, we’ll find all the li.selected in #main. This way, we continually refine our results until we have the elements we’re looking for.
The functionality that finds the elements will use the selector of the element(s) we’re looking for, and the parent element(s). It will figure out whether we’re looking for an element by id, class, or element name, and proceed accordingly.
When we reach the end of our array, we’ll return the results to the user.

Obviously, this method poses some constrains to Sylectra. First, we can’t chain multiple selectors together with commas. Second, we can’t use the direct child selector (>). Third, we can’t use pseudoclass, attribute selectors, or any of that jazz. But that’s okay . . . for now.

Ready to start coding? Good! We’re going to break this into easily-swallowable chunks, so you shouldn’t have a problem. Let’s go!

Step 1: Setting Up the Script

We’ll start by setting up our engine, which will be encapsulated in a single global function.

var SYLECTRA = function (selector) { 
    var i, len, curr_col, par, ret_arr = [], fns; 
 
};

My one global variable is in all caps, just the way I prefer it. Then, right at the top of the function, I’m initializing all the variables I’ll need, so I don’t have to worry about that later.

Step 2: Splitting the Selector

The next step is to split the selector into the array of elements.

if (selector.indexOf("#") > 0) { 
    selector = selector.split("#"); 
    selector = &#39;#&#39; + selector[selector.length - 1]; 
} 
selector = selector.split(&#39; &#39;);

The first thing we do here is check for the presence of a hash (#) other than at the very beginning of the string (where the index is 0). If there is one, we know somewhere in the selector we’re looking for an id. Well, we can only have one element with a given id, so anything before that hash is irrelevant. To get rid of these extra characters, we can split the string at every hash and only keep the last piece. Don’t forget to prepend that hash again, because we need to know it’s an id. Then, we split the string at the spaces, so we’ve got each element in it’s own array item.

Step 3: Coding the Core

The next step is to write the functions that actually find the elements. This is much simpler than it sounds, because we’ll tell these functions what they are looking for and what the context (parent) is.

fns = { 
    id : function (sel) { 
        return document.getElementById(sel); 
    }, 
    get : function (c_or_e, sel, par) { 
        var i = 0, arr = [], get_what = (c_or_e === &#39;class&#39;) ? &#39;getElementsByClassName&#39; : &#39;getElementsByTagName&#39;; 
        if (par.length) { 
            while (par[i]) { 
                Array.prototype.push.apply(arr, Array.prototype.slice.call(par[i][get_what](sel)));  
                i++; 
            } 
        } else { 
            arr = par[get_what](sel); 
        } 
        return (arr.length === 1) ? arr[0] : arr; 
    } 
};

You’re probably down with fns.id; couldn’t be easier. So let’s go though get. First, let me explain that when I first wrote this part, fns.get was actually two functions: fns.klass and fns.elements. However, once I had finished them, I realized that they were identical, other than the fact that one called getElementsByTagName and the other called getElementsByClassName. So, I rolled them into one function, and use the first parameter to decide which to use. Now, let’s walk through it.

First, we initialize the variables. Notice that I’m using a ternary expression to decide the value of get_what: if c_or_e is “class”, we’ll use “getElementsByClassName”; otherwise, it’s “getElementsByTagName”. Of course, this means that Sylectra won’t work on any browser that doesn’t support getElementsByClassName, but all the modern browsers do. Remember, our goal isn’t so much to create a foolproof selector engine as it is to learn how the “real” ones work. Of course, there are several implementations of getElementsByClassName you could find online to make this compatible with less-equipped browsers.
This function takes the parent of the elements we’re looking for as the last parameter. That par paramenter could hold two things: a node, or a nodelist (or an array, as we’ll see). A nodelist is just an array-like object that the browser returns from methods like getElementsByTagName. While it’s not a real array, it has zero-indexed elements, a length property, and a item method, which you usually access via square bracket notation, just like on a regular array. So, if par has a length property, we’ll loop over each item in par; I’m using a while-loop in this case because it’s simpler than a for-loop.
Inside this while-loop, we need to do two things. Get the appropriate child elements for the current parent, and store them in an array (we’ll be returning this array later). If we do this right we can roll this into one line, as you can see. We’ll get the elements with the line par[i][get_what](sel); remember, get_what is getElementsBy-something. Normally, you’d call those methods via the dot notation (par[i].getElementsByTagName), but since we have the value as a string, we can use the square bracket notation. This returns a nodelist, and we want to take all the items from it and move them to arr. We could loop over it and push each item in, but that seems unnecessary, because the array push method can take as many arguments as we want to hand it, and push them all in. What we need to do here is use the apply method on the push method. We can pass apply our array of items and it will convert them to a raw list of parameters for push. However, apply only takes an array, not a nodelist, so we have to call array’s slice method on the nodelist to convert it to an array. That’s the explanation for our single line of code! Oh, and don’t forget to increment i.

If you're not familiar with the apply method on functions, check out my recent quick tip on that topic.
If par doesn’t have a length property, it’s a single element. Therefore, we can get our child elements right off that and assign them to arr. When we go to return the function, if there’s only one element in the collection, we’ll just return that element; otherwise, we’ll return the array.

Believe it or not, that’s probably the most difficult part. Now we just have to use it!

Step 4: Putting it into Action

So we’ve set up our selector array and prepared the functions for getting the elements; now we just need to pair them up. Let’s begin:

len = selector.length; 
curr_col = document; 
 
for ( i = 0; i < len; i++) { 
    element = selector [i]; 
    par = curr_col; 
 
    if ( /* id */ ) { 
        // get element with id 
    } else if ( /* class */ ) { 
        //get elements with class 
    } else { 
        // get elements with tag name 
    } 
}

Here’s the skeleton of this part; we’ll fill it in soon. But make sure you understand what we’re going to do. After we set a few variables, we loop over the selector array. Inside our loop, we’ll set the element and par variables; as you can see, par is set to whatever our current collection of elements is; curr_col is set to document by default, and is adjusted to hold the returned elements for every step in the selector.

Let’s get into the first if statement now:

if (element.indexOf(&#39;#&#39;) === 0) { 
    curr_col = fns.id(element.split(&#39;#&#39;)[1]); 
}

If there’s a hash at the beginning of our current element string, we need to find the id. To do this, set curr_col to the result of our fns.id function, and pass in the element. Don’t forget to split the hash off the front!

else if (element.indexOf(&#39;.&#39;) > -1) { 
    element = element.split(&#39;.&#39;); 
    if (element[0]) { // if there&#39;s an element prefixed on the class name 
        par = fns.get(&#39;elements&#39;, element[0], par); 
        if (par.length) { 
            for (i = 0; par[i]; i++) { 
                if(par[i].className.indexOf(element[1]) > -1) { 
                    ret_arr.push(par[i]); 
                } 
            } 
            curr_col = ret_arr; 
        } else { 
            curr_col = (par.className.indexOf(element[1]) > -1) ? par : [];            } 
    } else { 
        curr_col = fns.get(&#39;class&#39;, element[1], par); 
    } 
}

If there’s a dot in the element string, then we’re looking for a class. However, the class may have an element on the front, like this: div.tweet_count. If that’s the case, we need to get all the elements, and then return only the ones that have the right class. First, we split it on the period. Then, if the first element in that array isn’t an empty string, we know there’s an element to get. In that case, we get the elements using our fns.get function. If we’re returned more than one element, then we loop over each one, testing each one for the class we’re looking for. If it has it, we push it into ret_arr. Remember, an element’s className property will have all the classes that element has in a single string, so we have to search the string for it.

If our call to fns.get for the elements returns only one item, we check that item for the class. If it’s there, we return it. Otherwise, we return an empty array.

If there isn’t anything in the first element of our elements array—meaning the selector was something like .selected—we only have to find the elements with that class. So, we’ll call to fns.get for the class, passing the class name and the parent.

else { 
    curr_col = fns.get(&#39;elements&#39;, element, par); 
}

Finally, if we’re dealing with a raw element name, we’ll set curr_col to those elements.

And that’s it. All that’s left now it to …

return curr_col;

Step 5: Testing our Engine

Now that our engine is done, we’re ready to test it. Plug this into an HTML file:

<!DOCTYPE HTML> 
<html lang="en"> 
<head> 
    <meta charset="UTF-8"> 
    <title> Sylectra | An Intensely Basic CSS DOM Selector </title> 
</head> 
<body> 
<div id="main"> 
    <header>#main header</header> 
    <section> 
        <h1>section h1</h1> 
        <ul> 
            <li class="button"><a href="#">ul li.button a 1</a></li> 
            <li><a href="#" class="selected">ul li a.selected 2</a></li> 
            <li><a href="#">ul li a 3</a></li> 
            <li><a href="#" class="selected">ul li a.selected 4</a></li> 
            <li id="last"><a href="#">ul li#last a 5</a></li> 
        </ul> 
        <p> section p</p> 
        <p class="other"> section p.other <strong> strong </strong> </p> 
    </section> 
    <footer> 
        <p> footer p </p> 
    </footer> 
</div> 
    <script src="sylectra.js"></script> 
    <script> 
        var selectors = [&#39;div#main&#39;, &#39;body&#39;, &#39;section&#39;, &#39;body div#main&#39;, &#39;p&#39;, &#39;ul li&#39;, &#39;ul li a&#39;, &#39;.selected&#39;, &#39;strong&#39;, &#39;#last p&#39;], i = 0; 
        while (selectors[i]) { 
            var el = SYLECTRA(selectors[i++]); 
            console.log("selector: ", selectors[i], " | result: ", el); 
        } 
    </script> 
</body> 
</html>

Notice the JavaScript at the bottom; I’ve created an array of selectors to test, including one that doesn’t exist. This covered pretty much all the use cases that we planned for, and passes them too!

And that’s really all there is to building a simple CSS selector engine!

Step 6: Making it Better

There are a few things we could do to make Sylectra better. See if you can implement the following:

support for multiple selectors at once: “#main, #sidebar h3, a.selected”.
support for direct child selectors: “li > a”.

I’ve put this project up on GitHub; if you feel like a challenge, fork the repository, make your changes, and share the link to your repo in the comments. Have fun!

HIGHLIGHTS OF THE DAY