Hash Maps/Dynamic Resizing
From charlesreid1
Hash Maps: Dynamic Resizing
Once we have our hash table implemented, we'll want to start adding lots of items. But then, we have to strike a balance.
The smaller the output space of the hash function, the smaller our map is in memory. But if we are using chaining, that leads to longer and longer linked lists, linked lists of size n/m, which gets further from O(1) as the linked lists get longer.
On the other hand, if we make m too big, and make the output space of our hash function too large, we end up wasting space in our hash table.
The concept behind expanding the hash table size dynamically is to maintain the balance on-the-fly. The concept is similar to the array resizing algorithm used by the Arrays/Java/PythonList PythonList class in Java, which implements a dynamically expanding/shrinking array.
Growing the table size, from m to m', requires the following steps:
- make table of size m'
- build new hash h'
- rehash
- for item in T: T'.insert(item)
How much time?
- O(n) at least
- In general, O(n + m + m')
Why new hash?
- Hash function is designed to have the specified output space of m
- Each key needs to be re-hashed
If n > m, we need to make the table bigger.
- If we make the table a constant amount larger, the cost of n inserts is a triangular number, theta(1+2+3+4+...+n) = theta(n^2)
- If we make the table twice as large, cost of n insertions is amortized
- m' = 2m: cost of n inserts
- theta(1 + 2 + 4 + 8 + ... + n)
- theta(n)
- Only a few - log N - operations are expensive
Similarly, deletions should happen when we get to n < m/4 quarter-full table
- Should shrink table size by half
- As we saw before, shrinking table size by quarter, we can have single table size or add/remove operation that costs O(N)
- Cost of n inserts
- Slow operation if we shrink by half when table is empty by half:
- slow operation is 2^k <--insert/delete--> (2^k) + 1
- This is a Theta(n) operation
- When you are a quarter full, you shrink by half
- Amortized time is Theta(1)
- Little bit tricky to prove...
Amortization
An operation takes "T(n) amortized" if k operations take time
k inserts take theta(k) time, so this is O(1) amortized insert
See also Amortization
Flags
Maps and Dictionaries Part of Computer Science Notes
Series on Data Structures
Maps/Dictionaries Maps · Maps/ADT · Maps in Java · Maps/OOP · Maps/Operations and Performance Map implementations: Maps/AbstractMap · Maps/UnsortedArrayMap · Maps/SortedArrayMap Dictionary implementations: Dictionaries/LinkedDict · Dictionaries/ArrayDict
Hashes Hash Maps/OOP · Hash Maps/Operations and Performance Hash Maps/Dynamic Resizing · Hash Maps/Collision Handling with Chaining Hash functions: Hash Functions · Hash Functions/Cyclic Permutation Hash map implementations: Hash Maps/AbstractHashMap · Hash Maps/ChainedHashMap
Skip Lists · Java/ConcurrentSkipList · Java implementations: SkipList
Sets Sets · Sets/ADT · Sets in Java · Sets/OOP · Multisets
|