Hash Tables Study Guide
From charlesreid1
Definitions and Variations
Definitions
hash table - data structure that uses key or value based lookup system
dictionary problem - searching for an item in a structure; if item is not found, you learn nothing new
hash function - a mapping from input objects to a hash table entry
bucket array - array of data used to store hash table entries, indexed by hash codes (key space)
collision - when two or more elements that are distinct/different have the same hash code
compression function - maps the output of the hash code to an integer in the range 0 to N-1
hash code - an integer that is uniquely determined by the input key
polynomial hash code - accounts for positions (not just presence) of various bits, each bit is the coefficient of a polynomial (use Horner's Rule to evaluate):
$ x_0 a^{n-1} + x_1 a^{n-2} + \dots + x_{n-2} a + x_{n-1} $
cyclic shift hash code - shifts the bits of the 32-bit key representation by X bits (left or right)
immutable - unable to be changed after construction/instantiation; keys must be immutable, or their hash value will change!
division method (compression) - turns the hash code into an integer between 0 and N-1 via the rather unimaginative
$ i \mod N $
MAD (multiply and divide) method (a.k.a. universal hashing) (compression - use the following compression function:
$ (ai + b \mod p ) \mod N $
where p > N and p is prime
collision resolution - rule used to determine what happens when a collision is found
separate chaining - using linked lists as buckets, instead of storing elements in an array directly
load factor - alpha - ratio of number of items N to number of buckets M, cost is proportional to load factor
open addressing - requires load factor is 1, stores elements directly instead of in buckets
linear probing - used with open addressing, collision handling technique in which a full element leads to a neighbor search (deletion becomes tricky, mark removed nodes)
quadratic probing - alternative that provides more desirable properties than linera probing
double hashing - if we hash a key h(k) and the position is occupied, we can use a second hash function to find a new key,
$ h(k) + i \cdot h^{\prime}(k) $
ADTs and Interfaces
Hash Table ADT
Hash Tables provide the following methods:
- get
- set
- remove
- size
- empty
- clear
- clone
- iterators, etc.
Implementations
Hash Table base:
- table for data
- number of entries
- has function coefficients
- prime p
- scale a
- shift b
- hash function (universal/MAD function)
- get/set/delete method
- resize method
Algorithms for Operations
Hash Map base class
get(k) method:
- j = find bucket(k)
- get bucket(j,k)
- return item
set(k,v) method:
- j = find bucket
- set bucket(j,k,v)
- update item
- update size
delete(k) method:
- j = find bucket(k)
- delete bucket(j,k)
- update size
find bucket(k):
- return hash code(k)
get, set, and delete methods are all public
find bucket method, get bucket method, set bucket method are private and depend on the concrete implementation
Chained Hash Map
get bucket(j,k) method:
- get bucket j
- return bucket j[k]
set bucket(j,k,v)
- get bucket j
- update bucketj[k] = v
remove bucket(j,k) method:
- get bucket j
- remove bucketj[k]
Linear Probing Hash Table
find bucket slot(j,k) method:
- Find key k in bucket j
- if true, found match, returning location
- if false, not match, returning 1st available slot
get bucket(j,k) method:
- find slot(j,k)
- if found:
- return table value
set bucket(j,k,v) method:
- find slot(j,k)
- if found:
- bucketj[k] = v
- else:
- new item(k,v)
- bucketj insert new item
- update size
remove bucket(j,k) method:
- find slot(j,k)
- if found:
- mark as vacated using AVAIL constant