The Linked List

The first data structure we will be looking at is the linked list, and with good reason. Besides being a nearly ubiquitous structure used in everything from operating systems to video games, it is also a building block with which many other data structures can be created.

Overview

In a very general sense, the purpose of a linked list is to provide a consistent mechanism to store and access an arbitrary amount of data. As its name implies, it does this by linking the data together into a list.

Before we dive into what this means, let’s start by reviewing how data is stored in an array.

Integer data stored in an array
Integer data stored in an array

As the figure shows, array data is stored as a single contiguously allocated chunk of memory that is logically segmented. The data stored in the array is placed in one of these segments and referenced via its location, or index, in the array.

This is a good way to store data. Most programming languages make it very easy to allocate arrays and operate on their contents. Contiguous data storage provides performance benefits (namely data locality), iterating over the data is simple, and the data can be accessed directly by index (random access) in constant time.

There are times, however, when an array is not the ideal solution.

Consider a program with the following requirements:

  1. Read an unknown number of integers from an input source (NextValue method) until the number 0xFFFF is encountered.
  2. Pass all of the integers that have been read (in a single call) to the ProcessItems method.

Since the requirements indicate that multiple values need to be passed to the ProcessItems method in a single call, one obvious solution would involve using an array of integers. For example:

This solution has several problems, but the most glaring is seen when more than 20 values are read. As the program is now, the values from 21 to n are simply ignored. This could be mitigated by allocating more than 20 values—perhaps 200 or 2000. Maybe the size could be configured by the user, or perhaps if the array became full a larger array could be allocated and all of the existing data copied into it. Ultimately these solutions create complexity and waste memory.

What we need is a collection that allows us to add an arbitrary number of integer values and then enumerate over those integers in the order that they were added. The collection should not have a fixed maximum size and random access indexing is not necessary. What we need is a linked list.

Before we go on and learn how the linked list data structure is designed and implemented, let’s preview what our ultimate solution might look like.

Notice that all of the problems with the array solution no longer exist. There are no longer any issues with the array not being large enough or allocating more than is necessary.

You should also notice that this solution informs some of the design decisions we will be making later, namely that the LinkedList class accepts a generic type argument and implements the IEnumerable interface.


Implementing a LinkedList Class

The Node

At the core of the linked list data structure is the Node class. A node is a container that provides the ability to both store data and connect to other nodes.

A linked list node contains data and a property pointing to the next node
A linked list node contains data and a property pointing to the next node

In its simplest form, a Node class that contains integers could look like this:

With this we can now create a very primitive linked list. In the following example we will allocate three nodes (first, middle, and last) and then link them together into a list.

We now have a linked list that starts with the node first and ends with the node last. The Next property for the last node points to null which is the end-of-list indicator. Given this list, we can perform some basic operations. For example, the value of each node’s Data property:

The PrintList method works by iterating over each node in the list, printing the value of the current node, and then moving on to the node pointed to by the Next property.

Now that we have an understanding of what a linked list node might look like, let’s look at the actual LinkedListNode class.

The LinkedList Class

Before implementing our LinkedList class, we need to think about what we’d like to be able to do with the list.

Earlier we saw that the collection needs to support strongly typed data so we know we want to create a generic interface.

Since we’re using the .NET framework to implement the list, it makes sense that we would want this class to be able to act like the other built-in collection types. The easiest way to do this is to implement the ICollection<T> interface. Notice I choose ICollection<T> and not IList<T>. This is because the IList<T> interface adds the ability to access values by index. While direct indexing is generally useful, it cannot be efficiently implemented in a linked list.

With these requirements in mind we can create a basic class stub, and then through the rest of the section we can fill in these methods.

Add

Behavior
Adds the provided value to the end of the linked list.
Performance O(1)

Adding an item to a linked list involves three steps:

  1. Allocate the new LinkedListNode instance.
  2. Find the last node of the existing list.
  3. Point the Next property of the last node to the new node.

The key is to know which node is the last node in the list. There are two ways we can know this. The first way is to keep track of the first node (the “head” node) and walk the list until we have found the last node. This approach does not require that we keep track of the last node, which saves one reference worth of memory (whatever your platform pointer size is), but does require that we perform a traversal of the list every time a node is added. This would make Add an O(n) operation.

The second approach requires that we keep track of the last node (the “tail” node) in the list and when we add the new node we simply access our stored reference directly. This is an O(1) algorithm and therefore the preferred approach.

The first thing we need to do is add two private fields to the LinkedList class: references to the first (head) and last (tail) nodes.

Next we need to add the method that performs the three steps.

First, it allocates the new LinkedListNode instance. Next, it checks whether the list is empty. If the list is empty, the new node is added simply by assigning the _head and _tail references to the new node. The new node is now both the first and last node in the list. If the list is not empty, the node is added to the end of the list and the _tail reference is updated to point to the new end of the list.

The Count property is incremented when a node is added to ensure the ICollection<T>. Count property returns the accurate value.

Remove

Behavior
Removes the first node in the list whose value equals the provided value. The method returns true if a value was removed. Otherwise it returns false.
Performance O(n)

Before talking about the Remove algorithm, let’s take a look at what it is trying to accomplish. In the following figure, there are four nodes in a list. We want to remove the node with the value three.

A linked list with four values
A linked list with four values

When the removal is done, the list will be modified such that the Next property on the node with the value two points to the node with the value four.

The linked list with the 3 node removed
The linked list with the 3 node removed

The basic algorithm for node removal is:

  1. Find the node to remove.
  2. Update the Next property of the node that precedes the node being removed to point to the node that follows the node being removed.

As always, the devil is in the details. There are a few cases we need to be thinking about when removing a node:

  • The list might be empty, or the value we are trying to remove might not be in the list. In this case the list would remain unchanged.
  • The node being removed might be the only node in the list. In this case we simply set the _head and _tail fields to null.
  • The node to remove might be the first node. In this case there is no preceding node, so instead we need to update the _head field to point to the new head node.
  • The node might be in the middle of the list.
  • The node might be the last node in the list. In this case we update the _tail field to reference the penultimate node in the list and set its Next property to null.

The Count property is decremented when a node is removed to ensure the ICollection<T>. Count property returns the accurate value.

Contains

Behavior
Returns a Boolean that indicates whether the provided value exists within the linked list.
Performance O(n)

The Contains method is quite simple. It looks at every node in the list, from first to last, and returns true as soon as a node matching the parameter is found. If the end of the list is reached and the node is not found, the method returns false.

GetEnumerator

Behavior
Returns an IEnumerator instance that allows enumerating the linked list values from first to last.
Performance Returning the enumerator instance is an O(1) operation. Enumerating every item is an O(n) operation.

GetEnumerator is implemented by enumerating the list from the first to last node and uses the C# yield keyword to return the current node’s value to the caller.

Notice that the LinkedList implements the iteration behavior in the IEnumerable<T> version of the GetEnumerator method and defers to this behavior in the IEnumerable version.

Clear

Behavior
Removes all the items from the list.
Performance O(1)

The Clear method simply sets the _head and _tail fields to null to clear the list. Because .NET is a garbage collected language, the nodes do not need to be explicitly removed. It is the responsibility of the caller, not the linked list, to ensure that if the nodes contain IDisposable references they are properly disposed of.

CopyTo

Behavior
Copies the contents of the linked list from start to finish into the provided array, starting at the specified array index.
Performance O(n)

The CopyTo method simply iterates over the list items and uses simple assignment to copy the items to the array. It is the caller’s responsibility to ensure that the target array contains the appropriate free space to accommodate all the items in the list.

Count

Behavior
Returns an integer indicating the number of items currently in the list. When the list is empty, the value returned is 0.
Performance O(1)

Count is simply an automatically implemented property with a public getter and private setter. The real behavior happens in the Add, Remove, and Clear methods.

IsReadOnly

Behavior
Returns false if the list is not read-only.
Performance O(1)

Doubly Linked List

The LinkedList class we just created is known as a singly, linked list. This means that there exists only a single, unidirectional link between a node and the next node in the list. There is a common variation of the linked list which allows the caller to access the list from both ends. This variation is known as a doubly linked list.

To create a doubly linked list we will need to first modify our LinkedListNode class to have a new property named Previous. Previous will act like Next, only it will point to the previous node in the list.

A doubly linked list using a Previous node property
A doubly linked list using a Previous node property

The following sections will only describe the changes between the singly linked list and the new doubly linked list.

Node Class

The only change that will be made in the LinkedListNode class is the addition of a new property named Previous which points to the previous LinkedListNode in the linked list, or returns null if it is the first node in the list.

Add

While the singly linked list only added nodes to the end of the list, the doubly linked list will allow adding nodes to the start and end of the list using AddFirst and AddLast, respectively. The ICollection<T>. Add method will defer to the AddLast method to retain compatibility with the singly linked List class.

AddFirst

Behavior
Adds the provided value to the front of the list.
Performance O(1)

When adding a node to the front of the list, the actions are very similar to adding to a singly linked list.

  1. Set the Next property of the new node to the old head node.
  2. Set the Previous property of the old head node to the new node.
  3. Update the _tail field (if necessary) and increment Count.

AddLast

Behavior Adds the provided value to the end of the list.
Performance O(1)

Adding a node to the end of the list is even easier than adding one to the start.

The new node is simply appended to the end of the list, updating the state of _tail and _head as appropriate, and Count is incremented.

And as mentioned earlier, ICollection<T>.Add will now simply call AddLast.

Remove

Like Add, the Remove method will be extended to support removing nodes from the start or end of the list. The ICollection<T>.Remove method will continue to remove items from the start with the only change being to update the appropriate Previous property.

RemoveFirst

Behavior Removes the first value from the list. If the list is empty, no action is taken. Returns true if a value was removed. Otherwise it returns false.
Performance O(1)

RemoveFirst updates the list by setting the linked list’s head property to the second node in the list and updating its Previous property to null. This removes all references to the previous head node, removing it from the list. If the list contained only a singleton, or was empty, the list will be empty (the head and tail properties will be null).

RemoveLast

Behavior Removes the last node from the list. If the list is empty, no action is performed. Returns true if a value was removed. Otherwise, it returns false.
Performance O(1)

RemoveLast works by setting the list's tail property to be the node preceding the current tail node. This removes the last node from the list. If the list was empty or had only one node, when the method returns the head and tail properties, they will both be null.

Remove

Behavior Removes the first node in the list whose value equals the provided value. The method returns true if a value was removed. Otherwise it returns false.
Performance O(n)

The ICollection<T>. Remove method is nearly identical to the singly linked version except that the Previous property is now updated during the remove operation. To avoid repeated code, the method calls RemoveFirst when it is determined that the node being removed is the first node in the list.

But Why?

We can add nodes to the front and end of the list—so what? Why do we care? As it stands right now, the doubly linked List class is no more powerful than the singly linked list. But with just one minor modification, we can open up all kinds of possible behaviors. By exposing the head and tail properties as read-only public properties, the linked list consumer will be able to implement all sorts of new behaviors.

With this simple change we can enumerate the list manually, which allows us to perform reverse (tail-to-head) enumeration and search.

For example, the following code sample shows how to use the list's Tail and Previous properties to enumerate the list in reverse and perform some processing on each node.

Additionally, the doubly linked List class allows us to easily create the Deque class, which is itself a building block for other classes. We will discuss this class later in another section.

Next Up

This completes the second part about linked lists. Next up, we'll move on to the array list.
Tags:

Comments

Related Articles