Complexity of List Operations

Last time, we learned that a "step" is any unit of work with bounded execution time (it doesn't keep getting slower with growing input size).

This definition provides significant flexibility. For example, consider the following:

print(1)
print(2)
print(3)

On could count each print as a single step, but given the definition, it would be equally legitimate to consider the 3-line code snippet to be a single step.

Just like a step might correspond to multiple lines, sometimes a single line corresponds to multiple steps. For example, the following line is not a single step, or a O(1) snippet of code:

total = add_them_up(nums)

It turns out the above in an O(N) operation, where N is the length of the nums list. Of course, you would only know that if we also show you the definition of add_them_up:

def add_them_up(values):
    total = 0
    for v in values:
        total += v
    return total

A common misconception is that functions and operations built into Python count as single steps, but this is not so. Python's sum function works much like the above add_them_up, so the following line is an O(N) snippet of code, not a single step:

total = add_them_up(nums)

In this reading, we'll consider 8 common list operations, each of which is either O(N) or O(1) -- calls to the latter can be counted as a single step. This has great practical significance: your code will generally be faster if you avoid the O(N) operations except when necessary.

  1. len
  2. index
  3. append
  4. insert
  5. pop
  6. extend
  7. in
  8. max

Remember that every process keeps all its data in a big "list" called an address space. It's a slight simplification, but useful to imagine each Python list as occupying a range of spots within the address space. For example, we could imagine a list of the letters A through D occupying addresses 3-6, like this:

   ABCD     [VALUES]
0123456789  [ADDRESSES]

Accessing at a single address counts a single step (by "accessing", we mean looking it up or changing it) -- we'll use this to reason about which of are other operations can or cannot count as a single step.

len(L)

We already saw that sum is an O(N) operation. You might guess len is too, and it would be if it were implemented something like this:

def get_length(L):
    length = 0
    for item in L:
        length += 1
    return length

Fortunately, Python lists keep some extra statistics to make this faster, something like this:

  4ABCD     [VALUES]
0123456789  [ADDRESSES]

When len(L) is called, Python looks up the pre-computed length instead of re-counting. Of course, this will need to be updated each time a new value it added:

  5ABCDE    [VALUES]
0123456789  [ADDRESSES]

You could imagine a better version of the Python list that also keeps track of the sum, as items are added/modified. For example, a list of [5,1,2] might look like this:

  37512     [VALUES]
0123456789  [ADDRESSES]

In this case, the values would be at addresses 4-6. The length of the list would be stored at position 2 and the sum at position 3. Hypothetically, if Python lists worked like this, both len and sum would be O(1) operations; in reality, only len is O(1) and sum is O(N). One takeaway is that when it comes to performance, you need to learn about the specific data structures you are using in your specific programming language. Your intuition about how things work may apply to varying degrees across different languages and environments.

Indexing

L[0] is fast (an O(1) operation) -- Python knows the address where the list values starts (in this case, address 3), and it can just quickly access that location in the address space.

  4ABCD     [VALUES]
0123456789  [ADDRESSES]

What about other positions, such as the end of the list? For example, L[2]? This is fast too. If the list starts at location 3 in the address space, then the value referred to by L[2] lives at location 3+2 in the address space. If we know the address, we can quickly get the value.

What about negative indexes? Do we need to loop over every item to find the end when you use L[-1]? No. L[-k] is the same as L[len(L)-k]. Comuting the length is O(1) and subtraction is O(1). This converts the negative index to a positive, and we've already discussed how accessing a positive index is O(1). Thus, the whole thing is O(1), or a single step.

L.append(...)

This is fast. Assuming our list looks something like this:

  4ABCD     [VALUES]
0123456789  [ADDRESSES]

Appending a value accesses a fixed number of locations (locations 2 and 7, to be precise):

  5ABCDE    [VALUES]
0123456789  [ADDRESSES]

L.insert(...)

Insterting is like appending, except we can at the value at any index we like.

In [1]:
L = [1,2,3,4]
L.insert(2, 100) # in the middle
L
Out[1]:
[1, 2, 100, 3, 4]
In [2]:
L = [1,2,3,4]
L.insert(0, 100) # at the beginning
L
Out[2]:
[100, 1, 2, 3, 4]

Inserting in the middle, or worse the beginning, causes many values to shift over.

  4ABCD     [VALUES]
0123456789  [ADDRESSES]

After L.insert(0, "Z"), it looks like this:

  5ZABCD     [VALUES]
0123456789  [ADDRESSES]

We had to move D from spot 6 to 7, C from spot 5 to 6, etc. As we have more values, it will run proportionately slower. Insert (in the worst case), is an O(N) operation.

L.pop(...)

  4ABCD     [VALUES]
0123456789  [ADDRESSES]

Popping from the end with L.pop(-1) is O(1):

  3ABC      [VALUES]
0123456789  [ADDRESSES]

In contrast, popping from the beginning with L.pop(0) is slow (O(N)), as we need to shift all the other values too (just like when we insert at the beginning):

  2BC       [VALUES]
0123456789  [ADDRESSES]

L.extend(...)

extend is a little different than append. Let's review the difference.

In [3]:
L = [1,2,3]
L2 = [4,5,6]
L.append(L2)
L
Out[3]:
[1, 2, 3, [4, 5, 6]]
In [4]:
L = [1,2,3]
L2 = [4,5,6]
L.extend(L2)
L
Out[4]:
[1, 2, 3, 4, 5, 6]

If L is ["A", "B", "C"] and L2 is ["X", "Y"], the address space might look something like this:

  ABC   XY  [VALUES]
0123456789  [ADDRESSES]

After extend, it would look like this:

  ABCXY XY  [VALUES]
0123456789  [ADDRESSES]

We've been categorizing the input size with one variable, N. Here, we have to lists, so we should we two variables. We'll use N for len(L) and M for len(L2). In this case, we have to copy every item from L2 (XY in the example above). So the complexity is in terms of L2's size: O(M). The time it takes to execute the extend does not depend on the size of L (that is, N).

in

You should imagine the in operator like this:

In [5]:
nums = [2,7,3,9,8,4]
4 in nums
Out[5]:
True
In [6]:
def in_function(values, target):
    for v in values:
        if v == target:
            return True
    return False

in_function(nums, 4)
Out[6]:
True

Thus, the in operator in O(N). If you want fast checks with the in operator, use sets:

In [7]:
nums = set([2,7,3,9,8,4])
4 in nums
Out[7]:
True

The worst case complexity is still O(N), but it's actually difficult to construct a situation for this worst case. In the average, or common, case, you should think of the in operators for sets as an O(1) operation.

max

The max(...) function is O(N), as it is similar to the following function:

In [8]:
def largest(values):
    best = values[0]
    for v in values:
        if v > best:
            best = v
    return best

nums = [2,7,3,9,8,4]
largest(nums)
Out[8]:
9

Similarly, min is O(N).

Optimization Example

Consider the following code, that prints off the percentage of each entry, relative to the total:

In [9]:
nums = [2,7,3,9,8,4]

for num in nums:
    print(num, "is", round(100 * num / sum(nums)), "percent")
2 is 6 percent
7 is 21 percent
3 is 9 percent
9 is 27 percent
8 is 24 percent
4 is 12 percent

The above is $O(N^2)$. If N is the length of the list, then the code loops N times. In each loop iteration, sum(nums), an O(N) operation is called.

We can optimize it by moving the sum call above the loop, calling it once, and saving the result. The following is O(N) -- yay!

In [10]:
nums = [2,7,3,9,8,4]

total = sum(nums)
for num in nums:
    print(num, "is", round(100 * num / total), "percent")
2 is 6 percent
7 is 21 percent
3 is 9 percent
9 is 27 percent
8 is 24 percent
4 is 12 percent