From charlesreid1

Hash Maps: Dynamic Resizing

Once we have our hash table implemented, we'll want to start adding lots of items. But then, we have to strike a balance.

The smaller the output space of the hash function, the smaller our map is in memory. But if we are using chaining, that leads to longer and longer linked lists, linked lists of size n/m, which gets further from O(1) as the linked lists get longer.

On the other hand, if we make m too big, and make the output space of our hash function too large, we end up wasting space in our hash table.

The concept behind expanding the hash table size dynamically is to maintain the balance on-the-fly. The concept is similar to the array resizing algorithm used by the Arrays/Java/PythonList PythonList class in Java, which implements a dynamically expanding/shrinking array.

Growing the table size, from m to m', requires the following steps:

  • make table of size m'
  • build new hash h'
  • rehash
  • for item in T: T'.insert(item)

How much time?

  • O(n) at least
  • In general, O(n + m + m')

Why new hash?

  • Hash function is designed to have the specified output space of m
  • Each key needs to be re-hashed

If n > m, we need to make the table bigger.

  • If we make the table a constant amount larger, the cost of n inserts is a triangular number, theta(1+2+3+4+...+n) = theta(n^2)
  • If we make the table twice as large, cost of n insertions is amortized
  • m' = 2m: cost of n inserts
  • theta(1 + 2 + 4 + 8 + ... + n)
  • theta(n)
  • Only a few - log N - operations are expensive

Similarly, deletions should happen when we get to n < m/4 quarter-full table

  • Should shrink table size by half
  • As we saw before, shrinking table size by quarter, we can have single table size or add/remove operation that costs O(N)
  • Cost of n inserts
  • Slow operation if we shrink by half when table is empty by half:
  • slow operation is 2^k <--insert/delete--> (2^k) + 1
  • This is a Theta(n) operation
  • When you are a quarter full, you shrink by half
  • Amortized time is Theta(1)
  • Little bit tricky to prove...

Amortization

An operation takes "T(n) amortized" if k operations take  \leq k \cdot T(n) time

k inserts take theta(k) time, so this is O(1) amortized insert

See also Amortization

Flags