From charlesreid1

Definitions and Variations

Definitions

dictionary - data structure in which values are mapped to keys, or act as their own keys.

dictionary problem - a find/insert/delete lookup problem in which you learn nothing if your search fails

map - mathematical analog term for dictionary, like a function (one input gives one output)

sorted map - data structure storing keys in a sorted order to provide data in order

hash table - a fast, unordered lookup table that works by turning an object (key or value) into a unique slot in an array via a hashing function

hash code - maps keys to integers

compression function - maps integers to a set of slots 0..N-1

bucket array - array storing the data of the hash table

hash function - a function h that maps keys k to an integer between 0 and N-1, denoted h(k)

immutability - a property required of keys

collisions - when two unique/distinct objects have the same output of the has function

chaining - technique to deal with collisions

exact search - lookups tell you nothing about nearby data

inexact search - (timestamps example) can use sorting to get context info, less than, greater than, subset, etc.

maxima sets - storing (x,y) points used to model a cost/performance tradeoff. (a,b) dominates (c,d) if a <= c and b >= d. (a,b) is maximum pair if not dominated by any other pairs.

skip lists - data structure for storing data in sorted order, high probability of O(log N) access, simpler to implement relative to balanced search trees

  • motivated by inefficiency of sorted lists
  • efficient search/update compromise
  • randomized data structure gives O(log N) with high expectation, randomizing DS so keys don't have to be randomized

height of skip list - number of linked lists to maintain

pseudo-random number generators - generate random-like numbers

start position of skip list - left-most node of top-most sparsest list

frozen set - an immutable set (Python type)

set - collection of related entities that does not allow duplicates, allows membership tests

multiset - set-like container allowing duplicates (counting map)

multimap - associates keys to values, but allows one key mapping to multiple values

template method pattern - virtual or unimplemented methods in parent class that are extended by children or by algorithms/functionality

Variations

Maps:

  • Sorted Map
    • Tree-based
    • Array-based
    • Skip list
  • Unsorted Map
    • Hash table based
    • Array based

Sets:

  • Sorted sets
    • Tree implementation
    • Skip list implementation
    • Array implementation
  • Unsorted sets
    • Hash table implementation
    • Array implementation

(NOTE: While the array implementations are inefficient for larger maps, for a small map like in a B-Tree, O(N) insertion is not a big deal if in return you get O(log N) or O(1) lookup and/or if you have a map with a max of 4 items.)


Skip lists

Hash tables

ADTs and Interfaces

Map ADT

Map ADT:

  • get
  • set
  • delete
  • contains
  • size
  • empty
  • iterator

Utilities:

  • key iterator/key set
  • value iterator/value set
  • clear
  • equality checks

Sorted Map ADT:

  • find min
  • find max
  • find less than/lte
  • find greater than/gte
  • find range
  • iterator
  • reversed

Hash Table ADT

Hash table ADT:

  • get
  • set
  • delete
  • resize
  • hash function: ak+b mod p mod N

Skip Lists

Skip list interface:

  • Same as map
  • get
  • set
  • delete
  • contains

Set ADT

Set ADT has 5 essential methods:

  • add
  • remove
  • contains
  • size
  • empty

From these 5, others can be extended:

  • pop
  • equals/not equals
  • lt/lte
  • gt/gte
  • union of sets
  • intersection of sets
  • symmetric difference
  • not operator

Implementations

Map:

  • Item class:
    • Key/value
    • equal/not equal/lt/gt

Sorted Map:

  • private find index (k, lo, hi)
  • public get/set/delete
  • iterator/reverse iterator
  • find max/min
  • find gt/gte/lt/lte

Table Map:

  • array of data: table

Hash Map:

  • Hash function
  • private get/set/remove from buckets
  • table data
  • prime integer
  • n entries integer
  • hash parameters a, b integer
  • resize

Set:

  • follow map implementation, but values are their own keys

Skip List:

  • Skip Node class:
    • level of skip node integer
    • data
    • Array or list of next Skip Node, indexed by level
    • Comparison operators
  • random number generator
  • max height integer
  • size integer
  • head node pointer

Algorithms for Operations

Table Maps:

  • Obvious, but impractical for large maps. Not detailed here.

Hash Maps:

Tree Maps:

  • All operations here are covered in the search tree study guide

Skip Lists

add(e) method:

  • deal with empty case (new head node)
  • get random insertion level (flipping coin)
  • create new node
  • walker = head
  • handle case where node goes before head
  • for each level,
    • walker = walker.next
    • while next node smaller than us:
      • walk fwd 1 step
    • if current level <= insertion level:
      • insert node in this level
      • walker = walker.next
  • increment size

remove method:

  • deal with empty case
  • check for remove head (head value = remove value)
  • step 1
  • step 2
  • step 3
  • update size

remove step 1: get predecessor

  • use protected getPredecessor method
  • find predecessor of node to be removed
  • predecessor.getNext(using level from prior step), gives result node

remove step 2:

  • to find result next node,
  • deal with removing head case:
    • prior is null
    • swap out result with result prior
    • result next = result.next

remove step 3:

  • starting from top occurring level, remove each node in the tower
    • result = result.getNext
    • result prev next = result next
    • if we aren't at last level of tower,
      • move down
      • set results previous at new level


getPredecessor method:

  • handle head case
  • look for the node (and level) where next nod has target value
  • if we reach end of row, move down
  • if we find a larger node, stop moving right and move down (move down, then walker = walker.next)
  • if we are bigger than the next node, keep moving right
  • if we are equal, return this node.

Complexity and Cost

Big O Complexity Table

Complexity depends on implementation. See Hash Tables Study Guide and Search Trees Study Guide for big O tables.

Analysis of Skip Lists

Big O Complexity

Big-O Complexity of Skip Lists


Skip List

get O(log N)
set O(log N)
contains O(log N)
remove O(log N)
find min O(1)
find max O(1)
find lt/find lte/find gt/find gte O(log N)
find range O(log N + S), S items returned
iterate over all items O(N)

Probabilistic Analysis of Performance

Performance analysis:

  • Base it on expectation
  • High likelihood of good performance, given by Chernoff bounds
  • Bounded height with high probability

at a given level i, what is the probability of having an entry? 1/2, from coin flip. These combine to give the expression above.

Given a constant c > 1, the height of a tree is larger than c log n with probability of at most

The probability that the height is smaller than c log n is

Space Usage

We can analyze the space usage of a probabilistic data set by analyzing the expected sum of number of nodes. This is given by the sum of number of positions at level i:

The last term can be simplified using the fact that

to say that

This number is bounded by 2, and therefore

and therefore the expected number of nodes in the skip list is O(n):

OOP Principles

Flags