From charlesreid1

(Fix typos, improve definition accuracy, fill empty OOP Principles section, fix geometric series formula (via update-page on MediaWiki MCP Server))
 
(7 intermediate revisions by the same user not shown)
Line 3: Line 3:
==Definitions==
==Definitions==


'''dictionary''' - data structure in which values are mapped to keys, or act as their own keys.
'''dictionary''' - data structure that stores key-value pairs, mapping each unique key to an associated value


'''dictionary problem''' - a find/insert/delete lookup problem in which you learn nothing if your search fails
'''dictionary problem''' - a find/insert/delete lookup problem in which you learn nothing if your search fails
Line 11: Line 11:
'''sorted map''' - data structure storing keys in a sorted order to provide data in order
'''sorted map''' - data structure storing keys in a sorted order to provide data in order


'''hash table''' - a fast, unordered lookup table that works by turning an object (key or value) into a unique slot in an array via a hashing function
'''hash table''' - a fast, unordered lookup table that works by turning a key into a slot in a bucket array via a hash function


'''hash code''' - maps keys to integers
'''hash code''' - maps keys to integers
Line 21: Line 21:
'''hash function''' - a function h that maps keys k to an integer between 0 and N-1, denoted h(k)
'''hash function''' - a function h that maps keys k to an integer between 0 and N-1, denoted h(k)


'''immutability''' - a property required of keys
'''immutability''' - a property required of keys in hash-based data structures; an immutable object's state cannot be modified after creation, ensuring its hash code remains constant


'''collisions''' - when two unique/distinct objects have the same output of the has function <math>h(k_i) = h(k_j), i \neq j</math>
'''collisions''' - when two unique/distinct objects have the same output of the hash function <math>h(k_i) = h(k_j), i \neq j</math>


'''chaining''' - technique to deal with collisions
'''chaining''' - technique to deal with collisions by storing colliding entries in a secondary data structure (e.g., linked list) at each bucket


'''exact search''' - lookups tell you nothing about nearby data
'''exact search''' - lookups tell you nothing about nearby data
Line 31: Line 31:
'''inexact search''' - (timestamps example) can use sorting to get context info, less than, greater than, subset, etc.
'''inexact search''' - (timestamps example) can use sorting to get context info, less than, greater than, subset, etc.


'''maxima sets''' - storing (x,y) points used to model a cost/performance tradeoff. (a,b) dominates (c,d) if a <= c and b >= d. (a,b) is maximum pair if not dominated by any other pairs.
'''maxima sets''' - storing (x,y) points used to model a cost/performance tradeoff. (a,b) dominates (c,d) if a &lt;= c and b &gt;= d. (a,b) is maximum pair if not dominated by any other pairs.


'''skip lists''' - data structure for storing data in sorted order, high probability of O(log N) access, simpler to implement relative to balanced search trees
'''skip lists''' - data structure for storing data in sorted order, high probability of O(log N) access, simpler to implement relative to balanced search trees
Line 52: Line 52:
'''multimap''' - associates keys to values, but allows one key mapping to multiple values
'''multimap''' - associates keys to values, but allows one key mapping to multiple values


'''template method pattern''' - virtual or unimplemented methods in parent class that are extended by children or by algorithms/functionality
'''template method pattern''' - a behavioral design pattern that defines the skeleton of an algorithm in a parent class method, deferring specific steps to subclasses through virtual/abstract methods


==Variations==
==Variations==
Line 193: Line 193:


Hash Maps:
Hash Maps:
* All operations are covered by [[Hash Table Study Guide]] operations section
* All operations are covered by [[Hash Tables Study Guide]] operations section


Tree Maps:
Tree Maps:
Line 210: Line 210:
** while next node smaller than us:
** while next node smaller than us:
*** walk fwd 1 step
*** walk fwd 1 step
** if current level <= insertion level:
** if current level &lt;= insertion level:
*** insert node in this level
*** insert node in this level
*** walker = walker.next
*** walker = walker.next
Line 218: Line 218:
* deal with empty case
* deal with empty case
* check for remove head (head value = remove value)
* check for remove head (head value = remove value)
* step 1
* step 1: get predecessor (use protected getPredecessor method to find predecessor of node to be removed; predecessor.getNext using level from prior step gives result node)
* step 2
* step 2: find result next node (deal with removing head case: prior is null, swap out result with result prior, result next = result.next)
* step 3
* step 3: starting from top occurring level, remove each node in the tower (result = result.getNext; result prev next = result next; if not at last level of tower, move down and set result's previous at new level)
* update size
* update size
remove step 1: get predecessor
* use protected getPredecessor method
* find predecessor of node to be removed
* predecessor.getNext(using leel from prior step), gives result node
remove step 2:
* to find result next node,
* deal with removing head case:
** prior is null
** swap out result with result prior
** result next = result.next
remove step 3:
* starting from top occurring level, remove each node in the tower
** result = result.getNext
** result prev next = result next
** if we aren't at last level of tower,
*** move down
*** set results previous at new level


getPredecessor method:
getPredecessor method:
* handle head case
* handle head case
* look for the node (and level) where next nod has target value
* look for the node (and level) where next node has target value
* if we reach end of row, move down
* if we reach end of row, move down
* if we find a larger node, stop moving right and move down (move down, then walker = walker.next)
* if we find a larger node, stop moving right and move down (move down, then walker = walker.next)
Line 266: Line 245:
|-
|-
|
|
|<br /><br />
|&lt;br /&gt;&lt;br /&gt;
Skip List
Skip List


Line 274: Line 253:


|-
|-
|sett
|set
|O(log N)
|O(log N)


Line 284: Line 263:
|remove
|remove
|O(log N)
|O(log N)
|-
|find min
|O(1)
|-
|find max
|O(1)
|-
|find lt/find lte/find gt/find gte
|O(log N)
|-
|find range
|O(log N + S), S items returned
|-
|iterate over all items
|O(N)


|}
|}
===Probabilistic Analysis of Performance===
Performance analysis:
* Base it on '''expectation'''
* High likelihood of good performance, given by Chernoff bounds
* Bounded height with high probability
The expected number of nodes at level i is given by:
<math>
\mathbb{E}(N_i) \leq \left( \dfrac{1}{2} \right)^i n = \dfrac{n}{2^i}
</math>
At a given level i, the probability of a node being promoted to that level is 1/2^i, from repeated coin flips. These combine to give the expression above.
Given a constant c &gt; 1, the height of a skip list is larger than c log n with probability of '''at most'''
<math>
\dfrac{1}{n^{c-1}}
</math>
The probability that the height is smaller than c log n is
<math>
\left( 1 - \dfrac{1}{n^{c-1}} \right)
</math>


===Space Usage===
===Space Usage===


We can analyze the space usage of a probabilistic data set by analyzing the expected sum of number of nodes. This is given by the sum of number of positions at level i:
We can analyze the space usage of a probabilistic data structure by analyzing the expected sum of number of nodes. This is given by the sum of the expected number of nodes at level i:


<math>
<math>
Line 295: Line 321:
</math>
</math>


The last term can be simplified using the fact that
The last term can be simplified using the geometric series formula:


<math>
<math>
\sum_{i=1}^{n} a^i = \dfrac{a^{i+1} - 1}{a-1}
\sum_{i=0}^{n} a^i = \dfrac{a^{n+1} - 1}{a-1}
</math>
</math>


Line 310: Line 336:


<math>
<math>
\mathbb{E}(N) = n \mathbb{P} = \sum_{i=0}^{h} n \mathbb{P}_i \leq 2n
\mathbb{E}(N) = \sum_{i=0}^{h} \mathbb{E}(N_i) \leq 2n
</math>
</math>


Line 317: Line 343:
<math>
<math>
\mathbb{E}(N) \sim O(n)
\mathbb{E}(N) \sim O(n)
<math>
</math>


=OOP Principles=
=OOP Principles=
Key object-oriented design principles employed in maps and sets:
* '''Template Method Pattern''' - Define algorithm skeleton in base class, defer implementation details to subclasses. Used extensively in map ADTs where traversal logic is shared but comparison operations are type-specific.
* '''Iterator Pattern''' - Provide a uniform way to traverse elements without exposing the underlying representation (array, tree, skip list, or hash table).
* '''Composition''' - Data structures compose lower-level building blocks (e.g., a hash map composes a bucket array; a skip list composes linked lists at multiple levels).
* '''Encapsulation''' - Internal representation (array, tree nodes, hash function parameters) is hidden behind the public ADT interface.
* '''Abstract Base Classes''' - Map, Set, and Sorted Map ADTs are defined as abstract base classes with virtual/abstract methods that concrete implementations must override.


=Flags=
=Flags=

Latest revision as of 03:59, 8 June 2026

Definitions and Variations

Definitions

dictionary - data structure that stores key-value pairs, mapping each unique key to an associated value

dictionary problem - a find/insert/delete lookup problem in which you learn nothing if your search fails

map - mathematical analog term for dictionary, like a function (one input gives one output)

sorted map - data structure storing keys in a sorted order to provide data in order

hash table - a fast, unordered lookup table that works by turning a key into a slot in a bucket array via a hash function

hash code - maps keys to integers

compression function - maps integers to a set of slots 0..N-1

bucket array - array storing the data of the hash table

hash function - a function h that maps keys k to an integer between 0 and N-1, denoted h(k)

immutability - a property required of keys in hash-based data structures; an immutable object's state cannot be modified after creation, ensuring its hash code remains constant

collisions - when two unique/distinct objects have the same output of the hash function $ h(k_i) = h(k_j), i \neq j $

chaining - technique to deal with collisions by storing colliding entries in a secondary data structure (e.g., linked list) at each bucket

exact search - lookups tell you nothing about nearby data

inexact search - (timestamps example) can use sorting to get context info, less than, greater than, subset, etc.

maxima sets - storing (x,y) points used to model a cost/performance tradeoff. (a,b) dominates (c,d) if a <= c and b >= d. (a,b) is maximum pair if not dominated by any other pairs.

skip lists - data structure for storing data in sorted order, high probability of O(log N) access, simpler to implement relative to balanced search trees

  • motivated by inefficiency of sorted lists
  • efficient search/update compromise
  • randomized data structure gives O(log N) with high expectation, randomizing DS so keys don't have to be randomized

height of skip list - number of linked lists to maintain

pseudo-random number generators - generate random-like numbers

start position of skip list - left-most node of top-most sparsest list

frozen set - an immutable set (Python type)

set - collection of related entities that does not allow duplicates, allows membership tests

multiset - set-like container allowing duplicates (counting map)

multimap - associates keys to values, but allows one key mapping to multiple values

template method pattern - a behavioral design pattern that defines the skeleton of an algorithm in a parent class method, deferring specific steps to subclasses through virtual/abstract methods

Variations

Maps:

  • Sorted Map
    • Tree-based
    • Array-based
    • Skip list
  • Unsorted Map
    • Hash table based
    • Array based

Sets:

  • Sorted sets
    • Tree implementation
    • Skip list implementation
    • Array implementation
  • Unsorted sets
    • Hash table implementation
    • Array implementation

(NOTE: While the array implementations are inefficient for larger maps, for a small map like in a B-Tree, O(N) insertion is not a big deal if in return you get O(log N) or O(1) lookup and/or if you have a map with a max of 4 items.)


Skip lists

Hash tables

ADTs and Interfaces

Map ADT

Map ADT:

  • get
  • set
  • delete
  • contains
  • size
  • empty
  • iterator

Utilities:

  • key iterator/key set
  • value iterator/value set
  • clear
  • equality checks

Sorted Map ADT:

  • find min
  • find max
  • find less than/lte
  • find greater than/gte
  • find range
  • iterator
  • reversed

Hash Table ADT

Hash table ADT:

  • get
  • set
  • delete
  • resize
  • hash function: ak+b mod p mod N

Skip Lists

Skip list interface:

  • Same as map
  • get
  • set
  • delete
  • contains

Set ADT

Set ADT has 5 essential methods:

  • add
  • remove
  • contains
  • size
  • empty

From these 5, others can be extended:

  • pop
  • equals/not equals
  • lt/lte
  • gt/gte
  • union of sets
  • intersection of sets
  • symmetric difference
  • not operator

Implementations

Map:

  • Item class:
    • Key/value
    • equal/not equal/lt/gt

Sorted Map:

  • private find index (k, lo, hi)
  • public get/set/delete
  • iterator/reverse iterator
  • find max/min
  • find gt/gte/lt/lte

Table Map:

  • array of data: table

Hash Map:

  • Hash function
  • private get/set/remove from buckets
  • table data
  • prime integer
  • n entries integer
  • hash parameters a, b integer
  • resize

Set:

  • follow map implementation, but values are their own keys

Skip List:

  • Skip Node class:
    • level of skip node integer
    • data
    • Array or list of next Skip Node, indexed by level
    • Comparison operators
  • random number generator
  • max height integer
  • size integer
  • head node pointer

Algorithms for Operations

Table Maps:

  • Obvious, but impractical for large maps. Not detailed here.

Hash Maps:

Tree Maps:

  • All operations here are covered in the search tree study guide

Skip Lists

add(e) method:

  • deal with empty case (new head node)
  • get random insertion level (flipping coin)
  • create new node
  • walker = head
  • handle case where node goes before head
  • for each level,
    • walker = walker.next
    • while next node smaller than us:
      • walk fwd 1 step
    • if current level <= insertion level:
      • insert node in this level
      • walker = walker.next
  • increment size

remove method:

  • deal with empty case
  • check for remove head (head value = remove value)
  • step 1: get predecessor (use protected getPredecessor method to find predecessor of node to be removed; predecessor.getNext using level from prior step gives result node)
  • step 2: find result next node (deal with removing head case: prior is null, swap out result with result prior, result next = result.next)
  • step 3: starting from top occurring level, remove each node in the tower (result = result.getNext; result prev next = result next; if not at last level of tower, move down and set result's previous at new level)
  • update size

getPredecessor method:

  • handle head case
  • look for the node (and level) where next node has target value
  • if we reach end of row, move down
  • if we find a larger node, stop moving right and move down (move down, then walker = walker.next)
  • if we are bigger than the next node, keep moving right
  • if we are equal, return this node.

Complexity and Cost

Big O Complexity Table

Complexity depends on implementation. See Hash Tables Study Guide and Search Trees Study Guide for big O tables.

Analysis of Skip Lists

Big O Complexity

Big-O Complexity of Skip Lists
<br /><br />

Skip List

get O(log N)
set O(log N)
contains O(log N)
remove O(log N)
find min O(1)
find max O(1)
find lt/find lte/find gt/find gte O(log N)
find range O(log N + S), S items returned
iterate over all items O(N)

Probabilistic Analysis of Performance

Performance analysis:

  • Base it on expectation
  • High likelihood of good performance, given by Chernoff bounds
  • Bounded height with high probability

The expected number of nodes at level i is given by:

$ \mathbb{E}(N_i) \leq \left( \dfrac{1}{2} \right)^i n = \dfrac{n}{2^i} $

At a given level i, the probability of a node being promoted to that level is 1/2^i, from repeated coin flips. These combine to give the expression above.

Given a constant c > 1, the height of a skip list is larger than c log n with probability of at most

$ \dfrac{1}{n^{c-1}} $

The probability that the height is smaller than c log n is

$ \left( 1 - \dfrac{1}{n^{c-1}} \right) $

Space Usage

We can analyze the space usage of a probabilistic data structure by analyzing the expected sum of number of nodes. This is given by the sum of the expected number of nodes at level i:

$ \sum_{i=0}^{h} n \mathbb{P}_i = n \left( \sum_{i=0}^{h} \dfrac{1}{2^i} \right) $

The last term can be simplified using the geometric series formula:

$ \sum_{i=0}^{n} a^i = \dfrac{a^{n+1} - 1}{a-1} $

to say that

$ \sum_{i=0}^{h} \dfrac{1}{2^i} = 2 \left( 1 - \dfrac{1}{2^{h+1}} \right) $

This number is bounded by 2, and therefore

$ \mathbb{E}(N) = \sum_{i=0}^{h} \mathbb{E}(N_i) \leq 2n $

and therefore the expected number of nodes in the skip list is O(n):

$ \mathbb{E}(N) \sim O(n) $

OOP Principles

Key object-oriented design principles employed in maps and sets:

  • Template Method Pattern - Define algorithm skeleton in base class, defer implementation details to subclasses. Used extensively in map ADTs where traversal logic is shared but comparison operations are type-specific.
  • Iterator Pattern - Provide a uniform way to traverse elements without exposing the underlying representation (array, tree, skip list, or hash table).
  • Composition - Data structures compose lower-level building blocks (e.g., a hash map composes a bucket array; a skip list composes linked lists at multiple levels).
  • Encapsulation - Internal representation (array, tree nodes, hash function parameters) is hidden behind the public ADT interface.
  • Abstract Base Classes - Map, Set, and Sorted Map ADTs are defined as abstract base classes with virtual/abstract methods that concrete implementations must override.

Flags