# Hash Tables Study Guide

### From charlesreid1

## Contents

# Definitions and Variations

## Definitions

**hash table** - data structure that uses key or value based lookup system

**dictionary problem** - searching for an item in a structure; if item is not found, you learn nothing new

**hash function** - a mapping from input objects to a hash table entry

**bucket array** - array of data used to store hash table entries, indexed by hash codes (key space)

**collision** - when two or more elements that are distinct/different have the same hash code

**compression function** - maps the output of the hash code to an integer in the range 0 to N-1

**hash code** - an integer that is uniquely determined by the input key

**polynomial hash code** - accounts for positions (not just presence) of various bits, each bit is the coefficient of a polynomial (use Horner's Rule to evaluate):

**cyclic shift hash code** - shifts the bits of the 32-bit key representation by X bits (left or right)

**immutable** - unable to be changed after construction/instantiation; keys must be immutable, or their hash value will change!

**division method (compression)** - turns the hash code into an integer between 0 and N-1 via the rather unimaginative

**MAD (multiply and divide) method (a.k.a. universal hashing) (compression)** - use the following compression function:

where p > N and p is prime

**collision resolution** - rule used to determine what happens when a collision is found

**separate chaining** - using linked lists as buckets, instead of storing elements in an array directly

**load factor** - alpha - ratio of number of items N to number of buckets M, cost is proportional to load factor

**open addressing** - requires load factor is 1, stores elements directly instead of in buckets

**linear probing** - used with open addressing, collision handling technique in which a full element leads to a neighbor search (deletion becomes tricky, mark removed nodes)

**quadratic probing** - alternative that provides more desirable properties than linear probing

**double hashing** - if we hash a key h(k) and the position is occupied, we can use a second hash function to find a new key/position

# ADTs and Interfaces

## Hash Table ADT

Hash Tables provide the following methods:

- hashFunction (turns key into index 0 to N-1)
- get
- set
- del
- getBucket (deletes an item from a particular bucket)
- setBucket (sets the value of an item in a particular bucket)
- delBucket (removes an item from a particular bucket)
- size
- empty
- clear
- clone
- iterators, etc.

# Implementations

Hash Table fields/members:

- table for data
- number of entries
- hash function coefficients
- prime p
- scale a
- shift b

Hash Table methods:

- hash function (universal/MAD function)
- get/set/del methods
- getBucket/setBucket/delBucket methods
- resize method

Chaining Hash Table methods:

- overrides getBucket/setBucket/delBucket to use "chained" linked lists for buckets

Probing Hash Table methods:

- overrides getBucket/setBucket/delBucket to use linear/quadratic probing to find open slots
- implements non-empty AVAILABLE sentinel

# Algorithms for Operations

## Hash Map base class

get(k) method:

- j = hashFunction(k)
- return getBucket(j,k)

set(k,v) method:

- j = hashFunction(k)
- setBucket(j,k,v)
- update size

delete(k) method:

- j = hashFunction(k)
- delBucket(j,k)
- update size

get, set, and delete methods are all public

hashFunction method varies depending on hash function chosen

getBucket, setBucket, delBucket are private and depend on the concrete implementation

## Chained Hash Map

get bucket(j,k) method:

- bucket_j = buckets[j]
- return bucket_j[k]

set bucket(j,k,v)

- bucket_j = buckets[j]
- bucket_j[k] = v

remove bucket(j,k) method:

- bucket_j = buckets[j]
- del bucket_j[k]

## Linear Probing Hash Table

findBucketSlot(j,k) method:

- Find key k in bucket j
- if true, found match, returning location
- if false, not match, returning 1st available slot

getBucket(j,k) method:

- findBucketSlot(j,k)
- if found:
- return table value

setBucket(j,k,v) method:

- findBucketSlot(j,k)
- if found:
- bucket_j[k] = v

- else:
- new item(k,v)
- bucket_j insert new item
- update size

removeBucket(j,k) method:

- findBucketSlot(j,k)
- if found:
- mark as vacated using AVAIL constant

# Complexity and Cost

Note: alpha is the loading factor, N/M, where N is the nubmer of elements in the hash table, and M is the number of buckets.

Big-O Complexity of Lists | |||
---|---|---|---|

Unsorted List Hash Map (expected) |
Unsorted List Hash Map (worst) |
Chained Hash Map | |

get | O(1) | O(n) | O(alpha) |

set | O(1) | O(n) | O(alpha) |

remove | O(1) | O(n) | O(alpha) |

size | O(1) | O(n) | O(1) |

empty | O(1) | O(n) | O(1) |

iteration over structure | O(n) | O(n) | O(n) |

# OOP Principles

Object hash codes

- Java and Python both provide methods for modifying/directly implementing a custom object hash code method

Private utility methods:

- Pack your functionality into general, protected utility methods, as with chained probing hash table implementations

# Flags

Maps and DictionariesSeries on Data Structures
Maps Map implementations: Maps/AbstractMap Dictionary implementations: Dictionaries/LinkedDict
Hash Maps/OOP Hash Maps/Dynamic Resizing Hash functions: Hash Functions Hash map implementations: Hash Maps/AbstractHashMap
Skip Lists Java implementations: SkipList
Sets
· Template:MapsFlagBase · e |