The Array List

Sometimes you want the flexible sizing and ease of use of a linked list but need to have the direct (constant time) indexing of an array. In these cases, an ArrayList can provide a reasonable middle ground.

Overview

ArrayList is a collection that implements the IList<T> interface but is backed by an array rather than a linked list. Like a linked list, an arbitrary number of items can be added (limited only by available memory), but behave like an array in all other respects.

Class Definition

The ArrayList class implements the IList<T> interface. IList<T> provides all the methods and properties of ICollection<T> while also adding direct indexing and index-based insertion and removal. The following code sample features stubs generated by using Visual Studio 2010’s Implement Interface command.

The following code sample also includes three additions to the generated stubs:

  • An array of T (_items). This array will hold the items in the collection.
  • A default constructor initializing the array to size zero.
  • A constructor accepting an integer length. This length will become the default capacity of the array. Remember that the capacity of the array and the collection Count are not the same thing. There may be scenarios when using the non-default constructor will allow the user to provide a sizing hint to the ArrayList class to minimize the number of times the internal array needs to be reallocated.

Insertion

Adding an item to an ArrayList is where the difference between the array and linked list really shows. There are two reasons for this. The first is that an ArrayList supports inserting values into the middle of the collection, whereas a linked list supports adding items to the start or end of the list. The second is that adding an item to a linked list is always an O(1) operation, but adding items to an ArrayList is either an O(1) or an O(n) operation.

Growing the Array

As items are added to the collection, eventually the internal array may become full. When this happens, the following needs to be done:

  1. Allocate a larger.
  2. Copy the elements from the smaller to the larger array.
  3. Update the internal array to be the larger array.

The only question we need to answer at this point is what size should the new array become ? The answer to this question is defined by the ArrayList growth policy.

We’ll look at two growth policies, and for each we’ll look at how quickly the array grows and how it can impact performance.

Doubling (Mono and Rotor)

There are two implementations of the ArrayList class we can look at online: Mono and Rotor. Both of them use a simple algorithm that doubles the size of the array each time an allocation is needed. If the array has a size of zero, the default capacity is 16. The algorithm is:

This algorithm has fewer allocations and array copies, but wastes more space on average than the Java approach. In other words, it is biased toward having more O(1) inserts, which should reduce the number of times the collection performs the time consuming allocation-and-copy operation. This comes at the cost of a larger average memory footprint, and, on average, more empty array slots.

Slower Growth (Java)

Java uses a similar approach but grows the array a little more slowly. The algorithm it uses to grow the array is:

This algorithm has a slower growth curve, which means it is biased toward less memory overhead at the cost of more allocations. Let’s look at the growth curve for these two algorithms for an ArrayList with more than 200,000 items added.

The growth curve for MonoRotor versus Java for 200000 items
The growth curve for Mono/Rotor versus Java for 200,000+ items

You can see in this graph that it took 19 allocations for the doubling algorithm to cross the 200,000 boundary, whereas it took the slower (Java) algorithm 30 allocations to get to the same point.

So which one is correct? There is no right or wrong answer. Doubling performs fewer O(n) operations, but has more memory overhead on average. The slower growth algorithm performs more O(n) operations but has less memory overhead. For a general purpose collection, either approach is acceptable. Your problem domain may have specific requirements that make one more attractive, or it may require you to create another approach altogether. Regardless of the approach you take, the collection’s fundamental behaviors will remain unchanged.

Our ArrayList class will be using the doubling (Mono/Rotor) approach.

Insert

Behavior Adds the provided value at the specified index in the collection. If the specified index is equal to or larger than Count, an exception is thrown
Performance O(n)

Inserting at a specific index requires shifting all of the items after the insertion point to the right by one. If the backing array is full, it will need to be grown before the shifting can be done.

In the following example, there is an array with a capacity of five items, four of which are in use. The value three will be inserted as the third item in the array (index two).

The array before the insert one open slot at the end
The array before the insert (one open slot at the end)
The array after shifting to the right
The array after shifting to the right
The array with the new item added at the open slot
The array with the new item added at the open slot

Add

Behavior Appends the provided value to the end of the collection.
Performance O(1) when the array capacity is greater than Count; O(n) when growth is necessary.

Deletion

RemoveAt

Behavior Removes the value at the specified index.
Performance O(n)

Removing at an index is essentially the reverse of the Insert operation. The item is removed from the array and the array is shifted to the left.

The array before the value 3 is removed
The array before the value 3 is removed
The array with the value 3 removed
The array with the value 3 removed
The array shifted to the left freeing the last slot
The array shifted to the left, freeing the last slot

Remove

Behavior Removes the first item in the collection whose value matches the provided value. Returns true if a value was removed. Otherwise it returns false.
Performance O(n)

Indexing

IndexOf

Behavior Returns the first index in the collection whose value equals the provided value. Returns -1 if no matching value is found.
Performance O(n)

Item

Behavior Gets or sets the value at the specified index.
Performance O(1)

Contains

Behavior Returns true if the provided value exists in the collection. Otherwise it returns false.
Performance O(n)

Enumeration

GetEnumerator

Behavior Returns an IEnumerator<T> instance that allows enumerating the array list values in order from first to last.
Performance Returning the enumerator instance is an O(1) operation. Enumerating every item is an O(n) operation.

Note that we cannot simply defer to the _items array’s GetEnumerator because that would also return the items that are not currently filled with data.

Remaining IList<T> Methods

Clear

Behavior Removes all the items from the array list.
Performance O(1)

There are two options when implementing Clear. The array can be left alone or it can be reallocated as a 0-length array. This implementation reallocates a new array with a length of zero. A larger array will be allocated when an item is added to the array using the Add or Insert methods.

CopyTo

Behavior Copies the contents of the internal array from start to finish into the provided array starting at the specified array index.
Performance O(n)

Note that the method does not simply defer to the _items array’s CopyTo method. This is because we only want to copy the range from index 0 to Count, not the entire array capacity. Using Array.Copy allows us to specify the number of items to copy.

Count

Behavior Returns an integer that indicates the number of items currently in the collection. When the list is empty, the value is 0.
Performance O(1)

Count is simply an automatically implemented property with a public getter and private setter. The real behavior happens in the functions that manipulate the collection contents.

IsReadOnly

Behavior Returns false because the collection is not read-only.
Performance O(1)

Next Up

This completes the third part about array lists. Next up, we'll move on to stack and queues.
Tags:

Comments

Related Articles