Recently, I thought I’d try my hand at creating a simple CSS selector engine. You know what I’m talking about: it’s the kind of thing you use in most JavaScript libraries, where you pass in a CSS selector to find the elements you want to work with. Today, I’ll show you how to create a really simple one!
Disclaimer
What we are going to be building today is in no way something you’d use on a live site. It will not have all the great features you’re used to using in the “real” selector engines, but it will give you a small taste of how complicated something like Sizzle or NWMatcher can be. And who knows? Maybe you’ll pick up a few JavaScript tips and tricks on the way.
Full Screencast
Step 0: Starting our Engine
If we just dove into the code right away, you would probably have drowned before we finished half a dozen lines—you wouldn’t know where anything was going. So let’s start by mapping out the details of our selector engine.
- Like most selector engines, it will need a cool name. I’ve chosen the name Sylectra, but feel free to flex your own creativity.
- The user will (obviously) pass a CSS selector into our function. The first step will be to split that string into an array, where each item in the array is an element in the selector hierarchy. So, if the selector is
#main li.selected a
, we should end up with[ "#main", "li.selected", "a"]
. - Then, we’ll loop over that array, finding the correct element(s) for each item in the array. Once we find them, we’ll use the result as the highest-level parent element(s) for the next element in the array. Continuing with our previous example, we’d start by finding the
#main
in thedocument
; then, we’ll find all theli.selected
in#main
. This way, we continually refine our results until we have the elements we’re looking for. - The functionality that finds the elements will use the selector of the element(s) we’re looking for, and the parent element(s). It will figure out whether we’re looking for an element by id, class, or element name, and proceed accordingly.
- When we reach the end of our array, we’ll return the results to the user.
Obviously, this method poses some constrains to Sylectra. First, we can’t chain multiple selectors together with commas. Second, we can’t use the direct child selector (>). Third, we can’t use pseudoclass, attribute selectors, or any of that jazz. But that’s okay . . . for now.
Ready to start coding? Good! We’re going to break this into easily-swallowable chunks, so you shouldn’t have a problem. Let’s go!
Step 1: Setting Up the Script
We’ll start by setting up our engine, which will be encapsulated in a single global function.
var SYLECTRA = function (selector) { var i, len, curr_col, par, ret_arr = [], fns; };
My one global variable is in all caps, just the way I prefer it. Then, right at the top of the function, I’m initializing all the variables I’ll need, so I don’t have to worry about that later.
Step 2: Splitting the Selector
The next step is to split the selector into the array of elements.
if (selector.indexOf("#") > 0) { selector = selector.split("#"); selector = '#' + selector[selector.length - 1]; } selector = selector.split(' ');
The first thing we do here is check for the presence of a hash (#) other than at the very beginning of the string (where the index is 0). If there is one, we know somewhere in the selector we’re looking for an id. Well, we can only have one element with a given id, so anything before that hash is irrelevant. To get rid of these extra characters, we can split the string at every hash and only keep the last piece. Don’t forget to prepend that hash again, because we need to know it’s an id. Then, we split the string at the spaces, so we’ve got each element in it’s own array item.
Step 3: Coding the Core
The next step is to write the functions that actually find the elements. This is much simpler than it sounds, because we’ll tell these functions what they are looking for and what the context (parent) is.
fns = { id : function (sel) { return document.getElementById(sel); }, get : function (c_or_e, sel, par) { var i = 0, arr = [], get_what = (c_or_e === 'class') ? 'getElementsByClassName' : 'getElementsByTagName'; if (par.length) { while (par[i]) { Array.prototype.push.apply(arr, Array.prototype.slice.call(par[i][get_what](sel))); i++; } } else { arr = par[get_what](sel); } return (arr.length === 1) ? arr[0] : arr; } };
You’re probably down with fns.id
; couldn’t be easier. So let’s go though get
. First, let me explain that when I first wrote this part, fns.get
was actually two functions: fns.klass
and fns.elements
. However, once I had finished them, I realized that they were identical, other than the fact that one called getElementsByTagName
and the other called getElementsByClassName
. So, I rolled them into one function, and use the first parameter to decide which to use. Now, let’s walk through it.
-
First, we initialize the variables. Notice that I’m using a ternary expression to decide the value of
get_what
: ifc_or_e
is “class”, we’ll use “getElementsByClassName”; otherwise, it’s “getElementsByTagName”. Of course, this means that Sylectra won’t work on any browser that doesn’t supportgetElementsByClassName
, but all the modern browsers do. Remember, our goal isn’t so much to create a foolproof selector engine as it is to learn how the “real” ones work. Of course, there are several implementations ofgetElementsByClassName
you could find online to make this compatible with less-equipped browsers. -
This function takes the parent of the elements we’re looking for as the last parameter. That
par
paramenter could hold two things: a node, or a nodelist (or an array, as we’ll see). A nodelist is just an array-like object that the browser returns from methods likegetElementsByTagName
. While it’s not a real array, it has zero-indexed elements, alength
property, and aitem
method, which you usually access via square bracket notation, just like on a regular array. So, ifpar
has a length property, we’ll loop over each item inpar
; I’m using a while-loop in this case because it’s simpler than a for-loop. -
Inside this while-loop, we need to do two things. Get the appropriate child elements for the current parent, and store them in an array (we’ll be returning this array later). If we do this right we can roll this into one line, as you can see. We’ll get the elements with the line
par[i][get_what](sel)
; remember,get_what
isgetElementsBy
-something. Normally, you’d call those methods via the dot notation (par[i].getElementsByTagName
), but since we have the value as a string, we can use the square bracket notation. This returns a nodelist, and we want to take all the items from it and move them toarr
. We could loop over it and push each item in, but that seems unnecessary, because the arraypush
method can take as many arguments as we want to hand it, and push them all in. What we need to do here is use theapply
method on thepush
method. We can passapply
our array of items and it will convert them to a raw list of parameters forpush
. However,apply
only takes an array, not a nodelist, so we have to call array’sslice
method on the nodelist to convert it to an array. That’s the explanation for our single line of code! Oh, and don’t forget to incrementi
.If you're not familiar with the
apply
method on functions, check out my recent quick tip on that topic. -
If
par
doesn’t have alength
property, it’s a single element. Therefore, we can get our child elements right off that and assign them toarr
. When we go to return the function, if there’s only one element in the collection, we’ll just return that element; otherwise, we’ll return the array.
Believe it or not, that’s probably the most difficult part. Now we just have to use it!
Step 4: Putting it into Action
So we’ve set up our selector array and prepared the functions for getting the elements; now we just need to pair them up. Let’s begin:
len = selector.length; curr_col = document; for ( i = 0; i < len; i++) { element = selector [i]; par = curr_col; if ( /* id */ ) { // get element with id } else if ( /* class */ ) { //get elements with class } else { // get elements with tag name } }
Here’s the skeleton of this part; we’ll fill it in soon. But make sure you understand what we’re going to do. After we set a few variables, we loop over the selector array. Inside our loop, we’ll set the element
and par
variables; as you can see, par
is set to whatever our current collection of elements is; curr_col
is set to document
by default, and is adjusted to hold the returned elements for every step in the selector.
Let’s get into the first if statement now:
if (element.indexOf('#') === 0) { curr_col = fns.id(element.split('#')[1]); }
If there’s a hash at the beginning of our current element string, we need to find the id. To do this, set curr_col
to the result of our fns.id
function, and pass in the element. Don’t forget to split the hash off the front!
else if (element.indexOf('.') > -1) { element = element.split('.'); if (element[0]) { // if there's an element prefixed on the class name par = fns.get('elements', element[0], par); if (par.length) { for (i = 0; par[i]; i++) { if(par[i].className.indexOf(element[1]) > -1) { ret_arr.push(par[i]); } } curr_col = ret_arr; } else { curr_col = (par.className.indexOf(element[1]) > -1) ? par : []; } } else { curr_col = fns.get('class', element[1], par); } }
If there’s a dot in the element string, then we’re looking for a class. However, the class may have an element on the front, like this: div.tweet_count
. If that’s the case, we need to get all the elements, and then return only the ones that have the right class. First, we split it on the period. Then, if the first element in that array isn’t an empty string, we know there’s an element to get. In that case, we get the elements using our fns.get
function. If we’re returned more than one element, then we loop over each one, testing each one for the class we’re looking for. If it has it, we push it into ret_arr
. Remember, an element’s className
property will have all the classes that element has in a single string, so we have to search the string for it.
If our call to fns.get
for the elements returns only one item, we check that item for the class. If it’s there, we return it. Otherwise, we return an empty array.
If there isn’t anything in the first element of our elements array—meaning the selector was something like .selected
—we only have to find the elements with that class. So, we’ll call to fns.get
for the class, passing the class name and the parent.
else { curr_col = fns.get('elements', element, par); }
Finally, if we’re dealing with a raw element name, we’ll set curr_col
to those elements.
And that’s it. All that’s left now it to …
return curr_col;
Step 5: Testing our Engine
Now that our engine is done, we’re ready to test it. Plug this into an HTML file:
<!DOCTYPE HTML> <html lang="en"> <head> <meta charset="UTF-8"> <title> Sylectra | An Intensely Basic CSS DOM Selector </title> </head> <body> <div id="main"> <header>#main header</header> <section> <h1>section h1</h1> <ul> <li class="button"><a href="#">ul li.button a 1</a></li> <li><a href="#" class="selected">ul li a.selected 2</a></li> <li><a href="#">ul li a 3</a></li> <li><a href="#" class="selected">ul li a.selected 4</a></li> <li id="last"><a href="#">ul li#last a 5</a></li> </ul> <p> section p</p> <p class="other"> section p.other <strong> strong </strong> </p> </section> <footer> <p> footer p </p> </footer> </div> <script src="sylectra.js"></script> <script> var selectors = ['div#main', 'body', 'section', 'body div#main', 'p', 'ul li', 'ul li a', '.selected', 'strong', '#last p'], i = 0; while (selectors[i]) { var el = SYLECTRA(selectors[i++]); console.log("selector: ", selectors[i], " | result: ", el); } </script> </body> </html>
Notice the JavaScript at the bottom; I’ve created an array of selectors to test, including one that doesn’t exist. This covered pretty much all the use cases that we planned for, and passes them too!
And that’s really all there is to building a simple CSS selector engine!
Step 6: Making it Better
There are a few things we could do to make Sylectra better. See if you can implement the following:
- support for multiple selectors at once: “#main, #sidebar h3, a.selected”.
- support for direct child selectors: “li > a”.
I’ve put this project up on GitHub; if you feel like a challenge, fork the repository, make your changes, and share the link to your repo in the comments. Have fun!
Comments