From charlesreid1

Definitions and Variations

Definitions

dictionary - data structure in which values are mapped to keys, or act as their own keys.

dictionary problem - a find/insert/delete lookup problem in which you learn nothing if your search fails

map - mathematical analog term for dictionary, like a function (one input gives one output)

sorted map - data structure storing keys in a sorted order to provide data in order

hash table - a fast, unordered lookup table that works by turning an object (key or value) into a unique slot in an array via a hashing function

hash code - maps keys to integers

compression function - maps integers to a set of slots 0..N-1

bucket array - array storing the data of the hash table

hash function - a function h that maps keys k to an integer between 0 and N-1, denoted h(k)

immutability - a property required of keys

collisions - when two unique/distinct objects have the same output of the has function h(k_i) = h(k_j), i \neq j

chaining - technique to deal with collisions

exact search - lookups tell you nothing about nearby data

inexact search - (timestamps example) can use sorting to get context info, less than, greater than, subset, etc.

maxima sets - storing (x,y) points used to model a cost/performance tradeoff. (a,b) dominates (c,d) if a <= c and b >= d. (a,b) is maximum pair if not dominated by any other pairs.

skip lists - data structure for storing data in sorted order, high probability of O(log N) access, simpler to implement relative to balanced search trees

  • motivated by inefficiency of sorted lists
  • efficient search/update compromise
  • randomized data structure gives O(log N) with high expectation, randomizing DS so keys don't have to be randomized

height of skip list - number of linked lists to maintain

pseudo-random number generators - generate random-like numbers

start position of skip list - left-most node of top-most sparsest list

frozen set - an immutable set (Python type)

set - collection of related entities that does not allow duplicates, allows membership tests

multiset - set-like container allowing duplicates (counting map)

multimap - associates keys to values, but allows one key mapping to multiple values

template method pattern - virtual or unimplemented methods in parent class that are extended by children or by algorithms/functionality

Variations

Maps:

  • Sorted Map
    • Tree-based
    • Array-based
    • Skip list
  • Unsorted Map
    • Hash table based
    • Array based

Sets:

  • Sorted sets
    • Tree implementation
    • Skip list implementation
    • Array implementation
  • Unsorted sets
    • Hash table implementation
    • Array implementation

(NOTE: While the array implementations are inefficient for larger maps, for a small map like in a B-Tree, O(N) insertion is not a big deal if in return you get O(log N) or O(1) lookup and/or if you have a map with a max of 4 items.)


Skip lists

Hash tables

ADTs and Interfaces

Map ADT

Map ADT:

  • get
  • set
  • delete
  • contains
  • size
  • empty
  • iterator

Utilities:

  • key iterator/key set
  • value iterator/value set
  • clear
  • equality checks

Sorted Map ADT:

  • find min
  • find max
  • find less than/lte
  • find greater than/gte
  • find range
  • iterator
  • reversed

Hash Table ADT

Hash table ADT:

  • get
  • set
  • delete
  • resize
  • hash function: ak+b mod p mod N

Skip Lists

Skip list interface:

  • Same as map
  • get
  • set
  • delete
  • contains

Set ADT

Set ADT has 5 essential methods:

  • add
  • remove
  • contains
  • size
  • empty

From these 5, others can be extended:

  • pop
  • equals/not equals
  • lt/lte
  • gt/gte
  • union of sets
  • intersection of sets
  • symmetric difference
  • not operator

Implementations

Map:

  • Item class:
    • Key/value
    • equal/not equal/lt/gt

Sorted Map:

  • private find index (k, lo, hi)
  • public get/set/delete
  • iterator/reverse iterator
  • find max/min
  • find gt/gte/lt/lte

Table Map:

  • array of data: table

Hash Map:

  • Hash function
  • private get/set/remove from buckets
  • table data
  • prime integer
  • n entries integer
  • hash parameters a, b integer
  • resize

Set:

  • follow map implementation, but values are their own keys

Skip List:

  • Skip Node class:
    • level of skip node integer
    • data
    • Array or list of next Skip Node, indexed by level
    • Comparison operators
  • random number generator
  • max height integer
  • size integer
  • head node pointer

Algorithms for Operations

Table Maps:

  • Obvious, but impractical for large maps. Not detailed here.

Hash Maps:

Tree Maps:

  • All operations here are covered in the search tree study guide

Skip Lists

add(e) method:

  • deal with empty case (new head node)
  • get random insertion level (flipping coin)
  • create new node
  • walker = head
  • handle case where node goes before head
  • for each level,
    • walker = walker.next
    • while next node smaller than us:
      • walk fwd 1 step
    • if current level <= insertion level:
      • insert node in this level
      • walker = walker.next
  • increment size

remove method:

  • deal with empty case
  • check for remove head (head value = remove value)
  • step 1
  • step 2
  • step 3
  • update size

remove step 1: get predecessor

  • use protected getPredecessor method
  • find predecessor of node to be removed
  • predecessor.getNext(using level from prior step), gives result node

remove step 2:

  • to find result next node,
  • deal with removing head case:
    • prior is null
    • swap out result with result prior
    • result next = result.next

remove step 3:

  • starting from top occurring level, remove each node in the tower
    • result = result.getNext
    • result prev next = result next
    • if we aren't at last level of tower,
      • move down
      • set results previous at new level


getPredecessor method:

  • handle head case
  • look for the node (and level) where next nod has target value
  • if we reach end of row, move down
  • if we find a larger node, stop moving right and move down (move down, then walker = walker.next)
  • if we are bigger than the next node, keep moving right
  • if we are equal, return this node.

Complexity and Cost

Big O Complexity Table

Complexity depends on implementation. See Hash Tables Study Guide and Search Trees Study Guide for big O tables.

Analysis of Skip Lists

Big O Complexity

Big-O Complexity of Skip Lists


Skip List

get O(log N)
set O(log N)
contains O(log N)
remove O(log N)
find min O(1)
find max O(1)
find lt/find lte/find gt/find gte O(log N)
find range O(log N + S), S items returned
iterate over all items O(N)

Probabilistic Analysis of Performance

Performance analysis:

  • Base it on expectation
  • High likelihood of good performance, given by Chernoff bounds
  • Bounded height with high probability


P_i \leq \left( \dfrac{1}{2} \right)^i n


P_i \leq \dfrac{n}{2^i}

at a given level i, what is the probability of having an entry? 1/2, from coin flip. These combine to give the expression above.

Given a constant c > 1, the height of a tree is larger than c log n with probability of at most


\dfrac{1}{n^{c-1}}

The probability that the height is smaller than c log n is


\left( 1 - \dfrac{1}{n^{c-1}} \right)

Space Usage

We can analyze the space usage of a probabilistic data set by analyzing the expected sum of number of nodes. This is given by the sum of number of positions at level i:


\sum_{i=0}^{h} n \mathbb{P}_i = n \left( \sum_{i=0}^{h} \dfrac{1}{2^i} \right)

The last term can be simplified using the fact that


\sum_{i=1}^{n} a^i = \dfrac{a^{i+1} - 1}{a-1}

to say that


\sum_{i=0}^{h} \dfrac{1}{2^i} = 2 \left( 1 - \dfrac{1}{2^{h+1}} \right)

This number is bounded by 2, and therefore


\mathbb{E}(N) = n \mathbb{P} = \sum_{i=0}^{h} n \mathbb{P}_i \leq 2n

and therefore the expected number of nodes in the skip list is O(n):


\mathbb{E}(N) \sim O(n)

OOP Principles

Flags