Effective Kotlin Item 60: Use appropriate collection types
This is a chapter from the book Effective Kotlin. You can find it on LeanPub or Amazon.
Lists, sets, and maps are represented by interfaces. Each of these collection types has a specific contract:
- List represents an ordered collection of elements. The same elements can occur multiple times. A list’s elements can be accessed by indices (zero-based integer numbers that reflect elements’ positions).
- Set represents a collection of unique elements.
- Map represents a set of key-value pairs; each key must be unique and points to exactly one value.
These contracts still leave plenty of freedom for different implementations of these collections. The biggest differences between different implementations are:
- The data structure that is used under the hood, which determines the efficiency of different operations.
- Whether the collection is mutable or immutable; if it is mutable, whether it is thread-safe or not.
In Kotlin, most collections we use are mutable under the hood, therefore they’re not thread-safe. This is the kind of collection we’ll concentrate on in this item. We discussed thread-safe collections in Item 2: Eliminate critical sections.
Regarding the data structure used under the hood, this is something we might want to consider when we optimize the performance of our application. So, let's consider the most important types of collections and their performance characteristics.
Array-based list
The default implementation of List
in Kotlin is the array-based list. It is just a wrapper over an array, which is like a certain number of reserved places next to each other in memory. An array-based list keeps track of the number of elements and their inner array size. When the number of elements reaches the size of the inner array, the array is replaced with a new one which is bigger. The size of the new array is usually calculated as the size of the old array multiplied by a certain factor, which in JVM ArrayList
is 1.5. This is how ArrayList grows. When the number of elements decreases, the size of the inner array is typically not decreased, but element references are replaced with null
values. This is how ArrayList shrinks.
By knowing how ArrayList works, we can predict the time, we can predict the time complexity of different operations on it:
- Accessing an element by its index is very fast (
O(1)
) because it’s just a matter of accessing the corresponding element in the inner array. - If there is still a place left in the internal array, adding an element to the end of the list is also very fast (
O(1)
) because it is just a matter of adding an element to the inner array. If there is not enough space, the internal array needs to be copied; this is more demanding for the CPU, but it is still a linear operation (O(n)
). This is why it’s a good idea to initialize ArrayList with a size that is the expected number of elements. - Adding an element to the beginning or in the middle of the list is a linear operation (
O(n)
) because all elements need to be shifted to the right. - Changing an element by its index is a constant time operation (
O(1)
) because it’s just a matter of changing the corresponding element in the inner array. - Removing an element from the end of the list is very fast (
O(1)
) because it’s just a matter of removing the last element from the inner array. - Removing an element from the beginning or the middle of the list is a linear operation (
O(n)
) because all elements need to be shifted to the left. - Searching for an element by value is a linear operation (
O(n)
) because all elements need to be checked one by one.
Deque
Deque (usually pronounced "deck") stands for "double ended queue" as it represents a queue in which we can effectively add or remove elements at both ends. Kotlin offers an ArrayDeque
implementation of this data structure which implements the MutableList<E>
interface. Its behavior is similar to ArrayList
, but its internal array is circular. This means that the beginning of the array is not necessarily the first element of this collection. There is a separate property (head
) that marks the beginning of the collection. This way, if there is enough space in the internal array, adding or removing elements at the beginning or end is a constant time operation (O(1)
) because it is just a matter of inserting an element into the array and changing the head
property; otherwise, if the array is full, we need to additionally shift elements, which has linear complexity.
When you need to implement a stack or a first-in-last-out (FILO), ArrayList
is a good choice. When you need to implement a queue, i.e., a first-in-first-out (FIFO), ArrayDeque
is a good choice.
Deque is a good choice when we often need to add or remove elements at the beginning or end of a list, like when we implement a queue or stack.
To implement a first-in-first-out queue, we most often use
Channel
from the Kotlin Coroutines library, which is synchronized and thread-safe.
Linked lists
The most well-known alternative to an array-based list is a linked list. This is a data structure that consists of nodes, where each node contains a value and a reference to the next node. The last node in the list has a reference to null
. The list itself has a reference to the first node. This is a simplified implementation of a linked list:
The biggest advantage of this data structure is that adding or removing elements from the beginning or the middle of the list is a constant time operation (O(1)
) because it is just a matter of changing references. Adding elements is always fast and does not require the initial capacity to be set. The biggest disadvantage is that accessing an element by its index is a linear operation (O(n)
) because we need to traverse the list from the beginning to the element with the given index. Adding or removing an element from the middle does not necessitate shifting the elements, but it does necessitate iterating over elements anyway to find the position by index, so it’s a linear operation (O(n)
). Linked lists also take twice as much space because they need to store not only the elements but also all the references to the next elements. Finding an element, changing an element, or iterating over a list has similar time complexity as in an array-based list (O(n)
).
Kotlin does not provide a LinkedList implementation. On JVM, you can use LinkedList
from Java stdlib, which we might choose to use when we often need to add or remove elements from the middle. Generally, linked lists are rarely used because, in most cases, an array-based list or deque is a better choice. However, the linking algorithm is used in many other data structures, like the default set or map, both of which use a hash table and links between elements.
Hash tables
We’ve already discussed hash tables in Item 43: Respect the contract of hashCode. This is a very popular algorithm that is used when we need an efficient way to find a value in a collection. Since sets and maps need to keep their elements unique, the default implementations of these collections use hash tables.
The basic form of a hash table loses the order of elements. To keep the order, the default implementations of Set
and Map
in Kotlin use a hash table in combination with links between elements. The actual classes are named LinkedHashMap
and LinkedHashSet
. So, each element in a hash table is a node of a linked list. This way, even though the Set
and Map
contracts do not guarantee the order of elements, Kotlins’ default Set
and Map
guarantee to keep the order in which elements were added.
If you don’t need to keep elements in order in a set or map, you can use an implementation that does not keep links between elements. The advantages of this are that it takes less memory and is faster, as maintaining these connections takes time. You can create such collections using the HashSet
or HashMap
classes, or the hashSetOf
or hashMapOf
functions.
These implementations of set and map keep the internal hash table size greater than the number of elements. Therefore, finding an element by key is a constant time operation (O(1)
). Adding an element is also a constant time operation (O(1)
) unless the hash table needs to be resized, which is a linear operation (O(n)
). Removing an element is also a constant time operation (O(1)
). Iterating over a set or map is a linear operation (O(n)
) because we need to traverse all the elements in the hash table.
Sorted binary trees
In computer science, there are many different kinds of tree-based data structures, but one of these is especially important from Kotlin's perspective: the sorted binary tree. This is a tree where each node has a maximum of two children, and each node has a value that is greater than all the values in its left subtree and smaller than all the values in its right subtree. This is a simplified implementation of a sorted binary tree:
In Kotlin, we can create a sorted set or a sorted map by using the sortedSetOf
and sortedMapOf
functions. These are useful when we need a collection that keeps elements in a specific order.
Finding an element in such a collection is a logarithmic operation (O(log n)
), which is better than for a regular array-based list but worse than for a hash table. Adding and removing elements is a logarithmic operation (O(log n)
) because we must traverse the tree to find the right place for the new element. Iterating over a tree is a linear operation (O(n)
) because we need to traverse the tree in order.
Summary
In this item, we’ve discussed the most popular collection types and data structures used in Kotlin, and we’ve learned that each has its own advantages and disadvantages. We’ve also learned that the time complexity of different operations on a collection is very important, especially when these collections contain a lot of elements. In such cases, we should always consider what the best collection for this use case will be.