The Binary Search Tree

So far we’ve looked at data structures that organize data in a linear fashion. Linked lists contain data from a single starting node to a single terminating node. Arrays hold data in contiguous, one-dimensional blocks.

Overview

In this section, we will see how adding one more dimension will allow us to introduce a new data structure: the tree. Specifically, we will be looking at a type of tree known as a binary search tree. Binary search trees take the general tree structure and apply a set of simple rules that define the tree's structure.

Before we learn about those rules, let’s learn what a tree is.

Tree Overview

A tree is a data structure where each node has zero or more children. For example, we might have a tree like this:

A organizational tree structure
A organizational tree structure

In this tree, we can see the organizational structure of a business. The blocks represent people or divisions within the company, and the lines represent reporting relationships. A tree is a very efficient, logical way to present and store this information.

The tree shown in the previous figure is a general tree. It represents parent/child relationships, but there are no rules for the structure. The CEO has one direct report but could just as easily have none or twenty. In the figure, Sales is shown to the left of Marketing, but that ordering has no meaning. In fact, the only observable constraint is that each node has at most one parent (and the top-most node, the Board of Directors, has no parent).

Binary Search Tree Overview

A binary search tree uses the same basic structure as the general tree shown in the last figure but with the addition of a few rules. These rules are:

  1. Each node can have zero, one, or two children.
  2. Any value less than the node’s value goes to the left child (or a child of the left child).
  3. Any value greater than, or equal to, the node’s value goes to the right child (or a child thereof).

Let’s look at a tree that is built using these rules:

Binary Search Tree
Binary Search Tree

Notice how the constraints we specified are enforced in the diagram. Every value to the left of the root node (eight) has a value less than eight, and every value to the right is greater than or equal to the root node. This rule applies recursively at every node along the way.

With this tree in mind, let’s think about the steps that went into building it. When the process started, the tree was empty and then a value, eight, was added. Because it was the first value added, it was put into the root (ultimate parent) position.

We don’t know the exact order that the rest of the nodes were added, but I’ll present one possible path. Values will be added using a method named Add that accepts the value.

Let’s walk through the first few items.

Eight was added first and became the root. Next, four was added. Since four is less than eight, it needs to go to the left of eight as per rule number two. Since eight has no child on its left, four becomes the immediate left child of eight.

Two is added next. two is less than eight, so it goes to the left. There is already a node to the left of eight, so the comparison logic is performed again. two is less than four, and four has no left child, so two becomes the left child of four.

Three is added next and goes to the left of eight and four. When compared to the two node, it is larger, so three is added to the right of two as per rule number three.

This cycle of comparing values at each node and then checking each child over and over until the proper slot is found is repeated for each value until the final tree structure is created.

The Node Class

The BinaryTreeNode represents a single node in the tree. It contains references to the left and right children (null if there are none), the node’s value, and the IComparable.CompareTo method which allows comparing the node values to determine if the value should go to the left or right of the current node. This is the entire BinaryTreeNode class—as you can see, it is very simple.

The Binary Search Tree Class

The BinaryTree class provides the basic methods you need to manipulate the tree: Add, Remove, a Contains method to determine if an item exists in the tree, several traversal and enumeration methods (these are methods that allow us to enumerate the nodes in the tree in various well-defined orders), and the normal Count and Clear methods.

To initialize the tree, there is a BinaryTreeNode reference that represents the head (root) node of the tree, and there is an integer that keeps track of how many items are in the tree.

Add

Behavior Adds the provided value to the correct location within the tree.
Performance O(log n) on average; O(n) in the worst case.

Adding a node to the tree isn’t terribly complex and is made even easier when the problem is simplified into a recursive algorithm. There are two cases that need to be considered:

  • The tree is empty.
  • The tree is not empty.

In the first case, we simply allocate the new node and add it to the tree. In the second case, we compare the value to the node’s value. If the value we are trying to add is less than the node’s value, the algorithm is repeated for the node’s left child. Otherwise, it is repeated for the node’s right child.

Remove

Behavior Removes the first nore found with the indicated value.
Performance O(log n) on average; O(n) in the worst case.

Removing a value from the tree is a conceptually simple operation that becomes surprisingly complex in practice.

At a high level, the operation is simple:

  1. Find the node to remove
  2. Remove it.

The first step is simple, and as we’ll see, is accomplished using the same mechanism that the Contains method uses. Once the node to be removed is identified, however, the operation can take one of three paths dictated by the state of the tree around the node to be removed. The three states are described in the following three cases.

Case One: The node to be removed has no right child.

Case 1 The node to be removed has no right child
Case 1 The node to be removed has no right child

In this case, the removal operation can simply move the left child, if there is one, into the place of the removed node. The resulting tree would look like this:

Case 1 - Tree state after removal
Case 1 - Tree state after removal

Case Two: The node to be removed has a right child which, in turn, has no left child.

Case two The node to be removed has a right child which has no left child
Case two The node to be removed has a right child which has no left child

In this case, we want to move the removed node’s right child (six) into the place of the removed node. The resulting tree will look like this:

Tree state after removal
Tree state after removal

Case Three: The node to be removed has a right child which, in turn, has a left child.

Case 3 - The node to be removed has a right child which has a left child
Case 3 - The node to be removed has a right child which has a left child

In this case, the left-most child of the removed node’s right child must be placed into the removed node’s slot.

Let’s take a minute to think about why this is true. There are two facts that we know about the sub-tree starting with the node being removed (for example, the sub-tree whose root is the node with the value five).

  • Every value to the right of the node is greater than or equal to five.
  • The smallest value in the right sub-tree is the left-most node.

We need to place a value into the removed node’s slot which is smaller than, or equal to, every node to its right. To do that, we need to get the smallest value on the right side. Therefore we need the right child’s left-most node.

After the node removal, the tree will look like this:

Case 3 - Tree after node removal
Case 3 - Tree after node removal

Now that we understand the three remove scenarios, let’s look at the code to make it happen.

One thing to note: The FindWithParent method (see the Contains section) returns the node to remove as well as the parent of the node being removed. This is done because when the node is removed, we need to update the parent’s Left or Right property to point to the new node.

We could avoid doing this if all nodes kept a reference to their parent, but that would introduce per-node memory overhead and bookkeeping costs that are only needed in this one case.

Contains

Behavior Returns true if the tree contains the provided value. Otherwise it returns false.
Performance O(log n) on average; O(n) in the worst case.

Contains defers to FindWithParent, which performs a simple tree-walking algorithm that performs the following steps, starting at the head node:

  1. If the current node is null, return null.
  2. If the current node value equals the sought value, return the current node.
  3. If the sought value is less than the current value, set the current node to left child and go to step number one.
  4. Set current node to right child and go to step number one.

Since Contains returns a Boolean, the returned value is determined by whether FindWithParent returns a non-null BinaryTreeNode (true) or a null one (false).

The FindWithParent method is used by the Remove method as well. The out parameter, parent, is not used by Contains.

Count

Behavior Returns the number of values in the tree (0 if empty).
Performance O(1)

The count field is incremented by the Add method and decremented by the Remove method.

Clear

Behavior Removes all the nodes from the tree.
Performance O(1)

Traversals

Tree traversals are algorithms that allow processing each value in the tree in a well-defined order. For each of the algorithms discussed, the following tree will be used as the sample input.

The examples that follow all accept an Action<T> parameter. This parameter defines the action that will be applied to each node as it is processed by the traversal.

The Order section for each traversal will indicate the order in which the following tree would traverse.

The sample tree for traversal ordering results
The sample tree for traversal ordering results

Preorder

Behavior Performs the provided action on each value in preorder (see the description that follows).
Performance O(n)
Order 4, 2, 1, 3, 5, 7, 6, 8

The preorder traversal processes the current node before moving to the left and then right children. Starting at the root node, four, the action is executed with the value four. Then the left node and all of its children are processed, followed by the right node and all of its children.

A common usage of the preorder traversal would be to create a copy of the tree that contained not just the same node values, but also the same hierarchy.

Postorder

Behavior Performs the provided action on each value in postorder (see the description that follows).
Performance O(n)
Order 1, 3, 2, 6, 8, 7, 5, 4

The postorder traversal visits the left and right child of the node recursively, and then performs the action on the current node after the children are complete.

Postorder traversals are often used to delete an entire tree, such as in programming languages where each node must be freed, or to delete subtrees. This is the case because the root node is processed (deleted) last and its children are processed in a way that will minimize the amount of work the Remove algorithm needs to perform.

Inorder

Behavior Performs the provided action on each value in inorder (see the description that follows).
Performance O(n)
Order 1, 2, 3, 4, 5, 6, 7, 8

Inorder traversal processes the nodes in the sort order—in the previous example, the nodes would be sorted in numerical order from smallest to largest. It does this by finding the smallest (left-most) node and then processing it before processing the larger (right) nodes.

Inorder traversals are used anytime the nodes must be processed in sort-order.

The example that follows shows two different methods of performing an inorder traversal. The first implements a recursive approach that performs a callback for each traversed node. The second removes the recursion through the use of the Stack data structure and returns an IEnumerator to allow direct enumeration.

GetEnumerator

Behavior Returns an enumerator that enumerates using the InOrder traversal algorithm.
Performance O(1) to return the enumerator. Enumerating all the items is O(n).

Next Up

This completes the fifth part about the Binary Search Tree. Next up, we'll learn about the Set collection.
Tags:

Comments

Related Articles