Sometimes you want the flexible sizing and ease of use of a linked list but need to have the direct (constant time) indexing of an array. In these cases, an ArrayList
can provide a reasonable middle ground.
Overview
ArrayList
is a collection that implements the IList<T>
interface but is backed by an array rather than a linked list. Like a linked list, an arbitrary number of items can be added (limited only by available memory), but behave like an array in all other respects.
Class Definition
The ArrayList class implements the IList<T>
interface. IList<T>
provides all the methods and properties of ICollection<T>
while also adding direct indexing and index-based insertion and removal. The following code sample features stubs generated by using Visual Studio 2010’s Implement Interface command.
The following code sample also includes three additions to the generated stubs:
- An array of
T (_items)
. This array will hold the items in the collection. - A default constructor initializing the array to size zero.
- A constructor accepting an integer length. This length will become the default capacity of the array. Remember that the capacity of the array and the collection Count are not the same thing. There may be scenarios when using the non-default constructor will allow the user to provide a sizing hint to the
ArrayList
class to minimize the number of times the internal array needs to be reallocated.
public class ArrayList : System.Collections.Generic.IList { T[] _items; public ArrayList() : this(0) { } public ArrayList(int length) { if (length < 0) { throw new ArgumentException("length"); } _items = new T[length]; } public int IndexOf(T item) { throw new NotImplementedException(); } public void Insert(int index, T item) { throw new NotImplementedException(); } public void RemoveAt(int index) { throw new NotImplementedException(); } public T this[int index] { get { throw new NotImplementedException(); } set { throw new NotImplementedException(); } } public void Add(T item) { throw new NotImplementedException(); } public void Clear() { throw new NotImplementedException(); } public bool Contains(T item) { throw new NotImplementedException(); } public void CopyTo(T[] array, int arrayIndex) { throw new NotImplementedException(); } public int Count { get { throw new NotImplementedException(); } } public bool IsReadOnly { get { throw new NotImplementedException(); } } public bool Remove(T item) { throw new NotImplementedException(); } public System.Collections.Generic.IEnumerator GetEnumerator() { throw new NotImplementedException(); } System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator() { throw new NotImplementedException(); } }
Insertion
Adding an item to an ArrayList
is where the difference between the array and linked list really shows. There are two reasons for this. The first is that an ArrayList
supports inserting values into the middle of the collection, whereas a linked list supports adding items to the start or end of the list. The second is that adding an item to a linked list is always an O(1) operation, but adding items to an ArrayList
is either an O(1) or an O(n) operation.
Growing the Array
As items are added to the collection, eventually the internal array may become full. When this happens, the following needs to be done:
- Allocate a larger.
- Copy the elements from the smaller to the larger array.
- Update the internal array to be the larger array.
The only question we need to answer at this point is what size should the new array become ? The answer to this question is defined by the ArrayList
growth policy.
We’ll look at two growth policies, and for each we’ll look at how quickly the array grows and how it can impact performance.
Doubling (Mono and Rotor)
There are two implementations of the ArrayList
class we can look at online: Mono and Rotor. Both of them use a simple algorithm that doubles the size of the array each time an allocation is needed. If the array has a size of zero, the default capacity is 16. The algorithm is:
size = size == 0 ? 1 : size * 2;
This algorithm has fewer allocations and array copies, but wastes more space on average than the Java approach. In other words, it is biased toward having more O(1) inserts, which should reduce the number of times the collection performs the time consuming allocation-and-copy operation. This comes at the cost of a larger average memory footprint, and, on average, more empty array slots.
Slower Growth (Java)
Java uses a similar approach but grows the array a little more slowly. The algorithm it uses to grow the array is:
size = (size * 3) / 2 + 1;
This algorithm has a slower growth curve, which means it is biased toward less memory overhead at the cost of more allocations. Let’s look at the growth curve for these two algorithms for an ArrayList
with more than 200,000 items added.
You can see in this graph that it took 19 allocations for the doubling algorithm to cross the 200,000 boundary, whereas it took the slower (Java) algorithm 30 allocations to get to the same point.
So which one is correct? There is no right or wrong answer. Doubling performs fewer O(n) operations, but has more memory overhead on average. The slower growth algorithm performs more O(n) operations but has less memory overhead. For a general purpose collection, either approach is acceptable. Your problem domain may have specific requirements that make one more attractive, or it may require you to create another approach altogether. Regardless of the approach you take, the collection’s fundamental behaviors will remain unchanged.
Our ArrayList
class will be using the doubling (Mono/Rotor) approach.
private void GrowArray() { int newLength = _items.Length == 0 ? 16 : _items.Length << 1; T[] newArray = new T[newLength]; _items.CopyTo(newArray, 0); _items = newArray; }
Insert
Behavior | Adds the provided value at the specified index in the collection. If the specified index is equal to or larger than Count , an exception is thrown |
Performance | O(n) |
Inserting at a specific index requires shifting all of the items after the insertion point to the right by one. If the backing array is full, it will need to be grown before the shifting can be done.
In the following example, there is an array with a capacity of five items, four of which are in use. The value three will be inserted as the third item in the array (index two).
public void Insert(int index, T item) { if (index >= Count) { throw new IndexOutOfRangeException(); } if (_items.Length == this.Count) { this.GrowArray(); } // Shift all the items following index one slot to the right. Array.Copy(_items, index, _items, index + 1, Count - index); _items[index] = item; Count++; }
Add
Behavior | Appends the provided value to the end of the collection. |
Performance | O(1) when the array capacity is greater than Count; O(n) when growth is necessary. |
public void Add(T item) { if (_items.Length == Count) { GrowArray(); } _items[Count++] = item; }
Deletion
RemoveAt
Behavior | Removes the value at the specified index. |
Performance | O(n) |
Removing at an index is essentially the reverse of the Insert operation. The item is removed from the array and the array is shifted to the left.
public void RemoveAt(int index) { if (index >= Count) { throw new IndexOutOfRangeException(); } int shiftStart = index + 1; if (shiftStart < Count) { // Shift all the items following index one slot to the left. Array.Copy(_items, shiftStart, _items, index, Count - shiftStart); } Count--; }
Remove
Behavior | Removes the first item in the collection whose value matches the provided value. Returns true if a value was removed. Otherwise it returns false . |
Performance | O(n) |
public bool Remove(T item) { for (int i = 0; i < Count; i++) { if (_items[i].Equals(item)) { RemoveAt(i); return true; } } return false; }
Indexing
IndexOf
Behavior | Returns the first index in the collection whose value equals the provided value. Returns -1 if no matching value is found. |
Performance | O(n) |
public int IndexOf(T item) { for (int i = 0; i < Count; i++) { if (_items[i].Equals(item)) { return i; } } return -1; }
Item
Behavior | Gets or sets the value at the specified index. |
Performance | O(1) |
public T this[int index] { get { if(index < Count) { return _items[index]; } throw new IndexOutOfRangeException(); } set { if (index < Count) { _items[index] = value; } else { throw new IndexOutOfRangeException(); } } }
Contains
Behavior | Returns true if the provided value exists in the collection. Otherwise it returns false. |
Performance | O(n) |
public bool Contains(T item) { return IndexOf(item) != -1; }
Enumeration
GetEnumerator
Behavior | Returns an IEnumerator<T> instance that allows enumerating the array list values in order from first to last. |
Performance | Returning the enumerator instance is an O(1) operation. Enumerating every item is an O(n) operation. |
Note that we cannot simply defer to the _items
array’s GetEnumerator
because that would also return the items that are not currently filled with data.
public System.Collections.Generic.IEnumerator GetEnumerator() { for (int i = 0; i < Count; i++) { yield return _items[i]; } } System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator() { return GetEnumerator(); }
Remaining IList<T> Methods
Clear
Behavior | Removes all the items from the array list. |
Performance | O(1) |
There are two options when implementing Clear
. The array can be left alone or it can be reallocated as a 0-length array. This implementation reallocates a new array with a length of zero. A larger array will be allocated when an item is added to the array using the Add
or Insert
methods.
Public void Clear() { _items = new T[0]; Count = 0; }
CopyTo
Behavior | Copies the contents of the internal array from start to finish into the provided array starting at the specified array index. |
Performance | O(n) |
Note that the method does not simply defer to the _items
array’s CopyTo
method. This is because we only want to copy the range from index 0
to Count
, not the entire array capacity. Using Array.Copy
allows us to specify the number of items to copy.
public void CopyTo(T[] array, int arrayIndex) { Array.Copy(_items, 0, array, arrayIndex, Count); }
Count
Behavior | Returns an integer that indicates the number of items currently in the collection. When the list is empty, the value is 0. |
Performance | O(1) |
Count
is simply an automatically implemented property with a public getter and private setter. The real behavior happens in the functions that manipulate the collection contents.
public int Count { get; private set; }
IsReadOnly
Behavior | Returns false because the collection is not read-only. |
Performance | O(1) |
public bool IsReadOnly { get { return false; } }
Comments