# Definitions and Variations

## Definitions

dictionary - data structure in which values are mapped to keys, or act as their own keys.

dictionary problem - a find/insert/delete lookup problem in which you learn nothing if your search fails

map - mathematical analog term for dictionary, like a function (one input gives one output)

sorted map - data structure storing keys in a sorted order to provide data in order

hash table - a fast, unordered lookup table that works by turning an object (key or value) into a unique slot in an array via a hashing function

hash code - maps keys to integers

compression function - maps integers to a set of slots 0..N-1

bucket array - array storing the data of the hash table

hash function - a function h that maps keys k to an integer between 0 and N-1, denoted h(k)

immutability - a property required of keys

collisions - when two unique/distinct objects have the same output of the has function $h(k_i) = h(k_j), i \neq j$

chaining - technique to deal with collisions

exact search - lookups tell you nothing about nearby data

inexact search - (timestamps example) can use sorting to get context info, less than, greater than, subset, etc.

maxima sets - storing (x,y) points used to model a cost/performance tradeoff. (a,b) dominates (c,d) if a <= c and b >= d. (a,b) is maximum pair if not dominated by any other pairs.

skip lists - data structure for storing data in sorted order, high probability of O(log N) access, simpler to implement relative to balanced search trees

• motivated by inefficiency of sorted lists
• efficient search/update compromise
• randomized data structure gives O(log N) with high expectation, randomizing DS so keys don't have to be randomized

height of skip list - number of linked lists to maintain

pseudo-random number generators - generate random-like numbers

start position of skip list - left-most node of top-most sparsest list

frozen set - an immutable set (Python type)

set - collection of related entities that does not allow duplicates, allows membership tests

multiset - set-like container allowing duplicates (counting map)

multimap - associates keys to values, but allows one key mapping to multiple values

template method pattern - virtual or unimplemented methods in parent class that are extended by children or by algorithms/functionality

## Variations

Maps:

• Sorted Map
• Tree-based
• Array-based
• Skip list
• Unsorted Map
• Hash table based
• Array based

Sets:

• Sorted sets
• Tree implementation
• Skip list implementation
• Array implementation
• Unsorted sets
• Hash table implementation
• Array implementation

(NOTE: While the array implementations are inefficient for larger maps, for a small map like in a B-Tree, O(N) insertion is not a big deal if in return you get O(log N) or O(1) lookup and/or if you have a map with a max of 4 items.)

Skip lists

Hash tables

• get
• set
• delete
• contains
• size
• empty
• iterator

Utilities:

• key iterator/key set
• value iterator/value set
• clear
• equality checks

• find min
• find max
• find less than/lte
• find greater than/gte
• find range
• iterator
• reversed

• get
• set
• delete
• resize
• hash function: ak+b mod p mod N

## Skip Lists

Skip list interface:

• Same as map
• get
• set
• delete
• contains

Set ADT has 5 essential methods:

• remove
• contains
• size
• empty

From these 5, others can be extended:

• pop
• equals/not equals
• lt/lte
• gt/gte
• union of sets
• intersection of sets
• symmetric difference
• not operator

# Implementations

Map:

• Item class:
• Key/value
• equal/not equal/lt/gt

Sorted Map:

• private find index (k, lo, hi)
• public get/set/delete
• iterator/reverse iterator
• find max/min
• find gt/gte/lt/lte

Table Map:

• array of data: table

Hash Map:

• Hash function
• private get/set/remove from buckets
• table data
• prime integer
• n entries integer
• hash parameters a, b integer
• resize

Set:

• follow map implementation, but values are their own keys

Skip List:

• Skip Node class:
• level of skip node integer
• data
• Array or list of next Skip Node, indexed by level
• Comparison operators
• random number generator
• max height integer
• size integer

# Algorithms for Operations

Table Maps:

• Obvious, but impractical for large maps. Not detailed here.

Hash Maps:

Tree Maps:

• All operations here are covered in the search tree study guide

## Skip Lists

• deal with empty case (new head node)
• get random insertion level (flipping coin)
• create new node
• handle case where node goes before head
• for each level,
• walker = walker.next
• while next node smaller than us:
• walk fwd 1 step
• if current level <= insertion level:
• insert node in this level
• walker = walker.next
• increment size

remove method:

• deal with empty case
• step 1
• step 2
• step 3
• update size

remove step 1: get predecessor

• use protected getPredecessor method
• find predecessor of node to be removed
• predecessor.getNext(using level from prior step), gives result node

remove step 2:

• to find result next node,
• deal with removing head case:
• prior is null
• swap out result with result prior
• result next = result.next

remove step 3:

• starting from top occurring level, remove each node in the tower
• result = result.getNext
• result prev next = result next
• if we aren't at last level of tower,
• move down
• set results previous at new level

getPredecessor method:

• look for the node (and level) where next nod has target value
• if we reach end of row, move down
• if we find a larger node, stop moving right and move down (move down, then walker = walker.next)
• if we are bigger than the next node, keep moving right
• if we are equal, return this node.

# Complexity and Cost

## Big O Complexity Table

Complexity depends on implementation. See Hash Tables Study Guide and Search Trees Study Guide for big O tables.

## Analysis of Skip Lists

### Big O Complexity

Big-O Complexity of Skip Lists

Skip List

get O(log N)
set O(log N)
contains O(log N)
remove O(log N)
find min O(1)
find max O(1)
find lt/find lte/find gt/find gte O(log N)
find range O(log N + S), S items returned
iterate over all items O(N)

### Probabilistic Analysis of Performance

Performance analysis:

• Base it on expectation
• High likelihood of good performance, given by Chernoff bounds
• Bounded height with high probability $P_i \leq \left( \dfrac{1}{2} \right)^i n$ $P_i \leq \dfrac{n}{2^i}$

at a given level i, what is the probability of having an entry? 1/2, from coin flip. These combine to give the expression above.

Given a constant c > 1, the height of a tree is larger than c log n with probability of at most $\dfrac{1}{n^{c-1}}$

The probability that the height is smaller than c log n is $\left( 1 - \dfrac{1}{n^{c-1}} \right)$

### Space Usage

We can analyze the space usage of a probabilistic data set by analyzing the expected sum of number of nodes. This is given by the sum of number of positions at level i: $\sum_{i=0}^{h} n \mathbb{P}_i = n \left( \sum_{i=0}^{h} \dfrac{1}{2^i} \right)$

The last term can be simplified using the fact that $\sum_{i=1}^{n} a^i = \dfrac{a^{i+1} - 1}{a-1}$

to say that $\sum_{i=0}^{h} \dfrac{1}{2^i} = 2 \left( 1 - \dfrac{1}{2^{h+1}} \right)$

This number is bounded by 2, and therefore $\mathbb{E}(N) = n \mathbb{P} = \sum_{i=0}^{h} n \mathbb{P}_i \leq 2n$

and therefore the expected number of nodes in the skip list is O(n): $\mathbb{E}(N) \sim O(n)$