From charlesreid1

(Fix typos, improve definition accuracy, fill empty OOP Principles section, fix geometric series formula (via update-page on MediaWiki MCP Server))
 
(14 intermediate revisions by the same user not shown)
Line 3: Line 3:
==Definitions==
==Definitions==


'''dictionary''' - data structure in which values are mapped to keys, or act as their own keys.
'''dictionary''' - data structure that stores key-value pairs, mapping each unique key to an associated value


'''dictionary problem''' - a find/insert/delete lookup problem in which you learn nothing if your search fails
'''dictionary problem''' - a find/insert/delete lookup problem in which you learn nothing if your search fails
Line 11: Line 11:
'''sorted map''' - data structure storing keys in a sorted order to provide data in order
'''sorted map''' - data structure storing keys in a sorted order to provide data in order


'''hash table''' - a fast, unordered lookup table that works by turning an object (key or value) into a unique slot in an array via a hashing function
'''hash table''' - a fast, unordered lookup table that works by turning a key into a slot in a bucket array via a hash function


'''hash code''' - maps keys to integers
'''hash code''' - maps keys to integers
Line 21: Line 21:
'''hash function''' - a function h that maps keys k to an integer between 0 and N-1, denoted h(k)
'''hash function''' - a function h that maps keys k to an integer between 0 and N-1, denoted h(k)


'''immutability''' - a property required of keys
'''immutability''' - a property required of keys in hash-based data structures; an immutable object's state cannot be modified after creation, ensuring its hash code remains constant


'''collisions''' - when two unique/distinct objects have the same output of the has function <math>h(k_i) = h(k_j), i \neq j</math>
'''collisions''' - when two unique/distinct objects have the same output of the hash function <math>h(k_i) = h(k_j), i \neq j</math>


'''chaining''' - technique to deal with collisions
'''chaining''' - technique to deal with collisions by storing colliding entries in a secondary data structure (e.g., linked list) at each bucket


'''exact search''' - lookups tell you nothing about nearby data
'''exact search''' - lookups tell you nothing about nearby data
Line 31: Line 31:
'''inexact search''' - (timestamps example) can use sorting to get context info, less than, greater than, subset, etc.
'''inexact search''' - (timestamps example) can use sorting to get context info, less than, greater than, subset, etc.


'''maxima sets''' - storing (x,y) points used to model a cost/performance tradeoff. (a,b) dominates (c,d) if a <= c and b >= d. (a,b) is maximum pair if not dominated by any other pairs.
'''maxima sets''' - storing (x,y) points used to model a cost/performance tradeoff. (a,b) dominates (c,d) if a &lt;= c and b &gt;= d. (a,b) is maximum pair if not dominated by any other pairs.


'''skip lists''' - data structure for storing data in sorted order, high probability of O(log N) access, simpler to implement relative to balanced search trees
'''skip lists''' - data structure for storing data in sorted order, high probability of O(log N) access, simpler to implement relative to balanced search trees
* motivated by inefficiency of sorted lists
* efficient search/update compromise
* randomized data structure gives O(log N) with high expectation, randomizing DS so keys don't have to be randomized
'''height of skip list''' - number of linked lists to maintain
'''pseudo-random number generators''' - generate random-like numbers
'''start position of skip list''' - left-most node of top-most sparsest list
'''frozen set''' - an immutable set (Python type)
'''set''' - collection of related entities that does not allow duplicates, allows membership tests
'''multiset''' - set-like container allowing duplicates (counting map)
'''multimap''' - associates keys to values, but allows one key mapping to multiple values
'''template method pattern''' - a behavioral design pattern that defines the skeleton of an algorithm in a parent class method, deferring specific steps to subclasses through virtual/abstract methods
==Variations==
Maps:
* Sorted Map
** Tree-based
** Array-based
** Skip list
* Unsorted Map
** Hash table based
** Array based
Sets:
* Sorted sets
** Tree implementation
** Skip list implementation
** Array implementation
* Unsorted sets
** Hash table implementation
** Array implementation
(NOTE: While the array implementations are inefficient for larger maps, for a small map like in a B-Tree, O(N) insertion is not a big deal if in return you get O(log N) or O(1) lookup and/or if you have a map with a max of 4 items.)
Skip lists
Hash tables
* See [[Hash Tables Study Guide]]


=ADTs and Interfaces=
=ADTs and Interfaces=
==Map ADT==
Map ADT:
* get
* set
* delete
* contains
* size
* empty
* iterator
Utilities:
* key iterator/key set
* value iterator/value set
* clear
* equality checks
Sorted Map ADT:
* find min
* find max
* find less than/lte
* find greater than/gte
* find range
* iterator
* reversed
==Hash Table ADT==
Hash table ADT:
* get
* set
* delete
* resize
* hash function: ak+b mod p mod N
==Skip Lists==
Skip list interface:
* Same as map
* get
* set
* delete
* contains
==Set ADT==
Set ADT has 5 essential methods:
* add
* remove
* contains
* size
* empty
From these 5, others can be extended:
* pop
* equals/not equals
* lt/lte
* gt/gte
* union of sets
* intersection of sets
* symmetric difference
* not operator


=Implementations=
=Implementations=
Map:
* Item class:
** Key/value
** equal/not equal/lt/gt
Sorted Map:
* private find index (k, lo, hi)
* public get/set/delete
* iterator/reverse iterator
* find max/min
* find gt/gte/lt/lte
Table Map:
* array of data: table
Hash Map:
* Hash function
* private get/set/remove from buckets
* table data
* prime integer
* n entries integer
* hash parameters a, b integer
* resize
Set:
* follow map implementation, but values are their own keys
Skip List:
* Skip Node class:
** level of skip node integer
** data
** Array or list of next Skip Node, indexed by level
** Comparison operators
* random number generator
* max height integer
* size integer
* head node pointer


=Algorithms for Operations=
=Algorithms for Operations=
Table Maps:
* Obvious, but impractical for large maps. Not detailed here.
Hash Maps:
* All operations are covered by [[Hash Tables Study Guide]] operations section
Tree Maps:
* All operations here are covered in the search tree study guide
==Skip Lists==
add(e) method:
* deal with empty case (new head node)
* get random insertion level (flipping coin)
* create new node
* walker = head
* handle case where node goes before head
* for each level,
** walker = walker.next
** while next node smaller than us:
*** walk fwd 1 step
** if current level &lt;= insertion level:
*** insert node in this level
*** walker = walker.next
* increment size
remove method:
* deal with empty case
* check for remove head (head value = remove value)
* step 1: get predecessor (use protected getPredecessor method to find predecessor of node to be removed; predecessor.getNext using level from prior step gives result node)
* step 2: find result next node (deal with removing head case: prior is null, swap out result with result prior, result next = result.next)
* step 3: starting from top occurring level, remove each node in the tower (result = result.getNext; result prev next = result next; if not at last level of tower, move down and set result's previous at new level)
* update size
getPredecessor method:
* handle head case
* look for the node (and level) where next node has target value
* if we reach end of row, move down
* if we find a larger node, stop moving right and move down (move down, then walker = walker.next)
* if we are bigger than the next node, keep moving right
* if we are equal, return this node.


=Complexity and Cost=
=Complexity and Cost=
==Big O Complexity Table==
Complexity depends on implementation. See [[Hash Tables Study Guide]] and [[Search Trees Study Guide]] for big O tables.
==Analysis of Skip Lists==
===Big O Complexity===
{| class="wikitable" cellpadding="100" width="66%"
!colspan="4"|Big-O Complexity of Skip Lists
|-
|
|&lt;br /&gt;&lt;br /&gt;
Skip List
|-
|get
|O(log N)
|-
|set
|O(log N)
|-
|contains
|O(log N)
|-
|remove
|O(log N)
|-
|find min
|O(1)
|-
|find max
|O(1)
|-
|find lt/find lte/find gt/find gte
|O(log N)
|-
|find range
|O(log N + S), S items returned
|-
|iterate over all items
|O(N)
|}
===Probabilistic Analysis of Performance===
Performance analysis:
* Base it on '''expectation'''
* High likelihood of good performance, given by Chernoff bounds
* Bounded height with high probability
The expected number of nodes at level i is given by:
<math>
\mathbb{E}(N_i) \leq \left( \dfrac{1}{2} \right)^i n = \dfrac{n}{2^i}
</math>
At a given level i, the probability of a node being promoted to that level is 1/2^i, from repeated coin flips. These combine to give the expression above.
Given a constant c &gt; 1, the height of a skip list is larger than c log n with probability of '''at most'''
<math>
\dfrac{1}{n^{c-1}}
</math>
The probability that the height is smaller than c log n is
<math>
\left( 1 - \dfrac{1}{n^{c-1}} \right)
</math>
===Space Usage===
We can analyze the space usage of a probabilistic data structure by analyzing the expected sum of number of nodes. This is given by the sum of the expected number of nodes at level i:
<math>
\sum_{i=0}^{h} n \mathbb{P}_i = n \left( \sum_{i=0}^{h} \dfrac{1}{2^i} \right)
</math>
The last term can be simplified using the geometric series formula:
<math>
\sum_{i=0}^{n} a^i = \dfrac{a^{n+1} - 1}{a-1}
</math>
to say that
<math>
\sum_{i=0}^{h} \dfrac{1}{2^i} = 2 \left( 1 - \dfrac{1}{2^{h+1}} \right)
</math>
This number is bounded by 2, and therefore
<math>
\mathbb{E}(N) = \sum_{i=0}^{h} \mathbb{E}(N_i) \leq 2n
</math>
and therefore the expected number of nodes in the skip list is O(n):
<math>
\mathbb{E}(N) \sim O(n)
</math>


=OOP Principles=
=OOP Principles=
Key object-oriented design principles employed in maps and sets:
* '''Template Method Pattern''' - Define algorithm skeleton in base class, defer implementation details to subclasses. Used extensively in map ADTs where traversal logic is shared but comparison operations are type-specific.
* '''Iterator Pattern''' - Provide a uniform way to traverse elements without exposing the underlying representation (array, tree, skip list, or hash table).
* '''Composition''' - Data structures compose lower-level building blocks (e.g., a hash map composes a bucket array; a skip list composes linked lists at multiple levels).
* '''Encapsulation''' - Internal representation (array, tree nodes, hash function parameters) is hidden behind the public ADT interface.
* '''Abstract Base Classes''' - Map, Set, and Sorted Map ADTs are defined as abstract base classes with virtual/abstract methods that concrete implementations must override.


=Flags=
=Flags=

Latest revision as of 03:59, 8 June 2026

Definitions and Variations

Definitions

dictionary - data structure that stores key-value pairs, mapping each unique key to an associated value

dictionary problem - a find/insert/delete lookup problem in which you learn nothing if your search fails

map - mathematical analog term for dictionary, like a function (one input gives one output)

sorted map - data structure storing keys in a sorted order to provide data in order

hash table - a fast, unordered lookup table that works by turning a key into a slot in a bucket array via a hash function

hash code - maps keys to integers

compression function - maps integers to a set of slots 0..N-1

bucket array - array storing the data of the hash table

hash function - a function h that maps keys k to an integer between 0 and N-1, denoted h(k)

immutability - a property required of keys in hash-based data structures; an immutable object's state cannot be modified after creation, ensuring its hash code remains constant

collisions - when two unique/distinct objects have the same output of the hash function $ h(k_i) = h(k_j), i \neq j $

chaining - technique to deal with collisions by storing colliding entries in a secondary data structure (e.g., linked list) at each bucket

exact search - lookups tell you nothing about nearby data

inexact search - (timestamps example) can use sorting to get context info, less than, greater than, subset, etc.

maxima sets - storing (x,y) points used to model a cost/performance tradeoff. (a,b) dominates (c,d) if a <= c and b >= d. (a,b) is maximum pair if not dominated by any other pairs.

skip lists - data structure for storing data in sorted order, high probability of O(log N) access, simpler to implement relative to balanced search trees

  • motivated by inefficiency of sorted lists
  • efficient search/update compromise
  • randomized data structure gives O(log N) with high expectation, randomizing DS so keys don't have to be randomized

height of skip list - number of linked lists to maintain

pseudo-random number generators - generate random-like numbers

start position of skip list - left-most node of top-most sparsest list

frozen set - an immutable set (Python type)

set - collection of related entities that does not allow duplicates, allows membership tests

multiset - set-like container allowing duplicates (counting map)

multimap - associates keys to values, but allows one key mapping to multiple values

template method pattern - a behavioral design pattern that defines the skeleton of an algorithm in a parent class method, deferring specific steps to subclasses through virtual/abstract methods

Variations

Maps:

  • Sorted Map
    • Tree-based
    • Array-based
    • Skip list
  • Unsorted Map
    • Hash table based
    • Array based

Sets:

  • Sorted sets
    • Tree implementation
    • Skip list implementation
    • Array implementation
  • Unsorted sets
    • Hash table implementation
    • Array implementation

(NOTE: While the array implementations are inefficient for larger maps, for a small map like in a B-Tree, O(N) insertion is not a big deal if in return you get O(log N) or O(1) lookup and/or if you have a map with a max of 4 items.)


Skip lists

Hash tables

ADTs and Interfaces

Map ADT

Map ADT:

  • get
  • set
  • delete
  • contains
  • size
  • empty
  • iterator

Utilities:

  • key iterator/key set
  • value iterator/value set
  • clear
  • equality checks

Sorted Map ADT:

  • find min
  • find max
  • find less than/lte
  • find greater than/gte
  • find range
  • iterator
  • reversed

Hash Table ADT

Hash table ADT:

  • get
  • set
  • delete
  • resize
  • hash function: ak+b mod p mod N

Skip Lists

Skip list interface:

  • Same as map
  • get
  • set
  • delete
  • contains

Set ADT

Set ADT has 5 essential methods:

  • add
  • remove
  • contains
  • size
  • empty

From these 5, others can be extended:

  • pop
  • equals/not equals
  • lt/lte
  • gt/gte
  • union of sets
  • intersection of sets
  • symmetric difference
  • not operator

Implementations

Map:

  • Item class:
    • Key/value
    • equal/not equal/lt/gt

Sorted Map:

  • private find index (k, lo, hi)
  • public get/set/delete
  • iterator/reverse iterator
  • find max/min
  • find gt/gte/lt/lte

Table Map:

  • array of data: table

Hash Map:

  • Hash function
  • private get/set/remove from buckets
  • table data
  • prime integer
  • n entries integer
  • hash parameters a, b integer
  • resize

Set:

  • follow map implementation, but values are their own keys

Skip List:

  • Skip Node class:
    • level of skip node integer
    • data
    • Array or list of next Skip Node, indexed by level
    • Comparison operators
  • random number generator
  • max height integer
  • size integer
  • head node pointer

Algorithms for Operations

Table Maps:

  • Obvious, but impractical for large maps. Not detailed here.

Hash Maps:

Tree Maps:

  • All operations here are covered in the search tree study guide

Skip Lists

add(e) method:

  • deal with empty case (new head node)
  • get random insertion level (flipping coin)
  • create new node
  • walker = head
  • handle case where node goes before head
  • for each level,
    • walker = walker.next
    • while next node smaller than us:
      • walk fwd 1 step
    • if current level <= insertion level:
      • insert node in this level
      • walker = walker.next
  • increment size

remove method:

  • deal with empty case
  • check for remove head (head value = remove value)
  • step 1: get predecessor (use protected getPredecessor method to find predecessor of node to be removed; predecessor.getNext using level from prior step gives result node)
  • step 2: find result next node (deal with removing head case: prior is null, swap out result with result prior, result next = result.next)
  • step 3: starting from top occurring level, remove each node in the tower (result = result.getNext; result prev next = result next; if not at last level of tower, move down and set result's previous at new level)
  • update size

getPredecessor method:

  • handle head case
  • look for the node (and level) where next node has target value
  • if we reach end of row, move down
  • if we find a larger node, stop moving right and move down (move down, then walker = walker.next)
  • if we are bigger than the next node, keep moving right
  • if we are equal, return this node.

Complexity and Cost

Big O Complexity Table

Complexity depends on implementation. See Hash Tables Study Guide and Search Trees Study Guide for big O tables.

Analysis of Skip Lists

Big O Complexity

Big-O Complexity of Skip Lists
<br /><br />

Skip List

get O(log N)
set O(log N)
contains O(log N)
remove O(log N)
find min O(1)
find max O(1)
find lt/find lte/find gt/find gte O(log N)
find range O(log N + S), S items returned
iterate over all items O(N)

Probabilistic Analysis of Performance

Performance analysis:

  • Base it on expectation
  • High likelihood of good performance, given by Chernoff bounds
  • Bounded height with high probability

The expected number of nodes at level i is given by:

$ \mathbb{E}(N_i) \leq \left( \dfrac{1}{2} \right)^i n = \dfrac{n}{2^i} $

At a given level i, the probability of a node being promoted to that level is 1/2^i, from repeated coin flips. These combine to give the expression above.

Given a constant c > 1, the height of a skip list is larger than c log n with probability of at most

$ \dfrac{1}{n^{c-1}} $

The probability that the height is smaller than c log n is

$ \left( 1 - \dfrac{1}{n^{c-1}} \right) $

Space Usage

We can analyze the space usage of a probabilistic data structure by analyzing the expected sum of number of nodes. This is given by the sum of the expected number of nodes at level i:

$ \sum_{i=0}^{h} n \mathbb{P}_i = n \left( \sum_{i=0}^{h} \dfrac{1}{2^i} \right) $

The last term can be simplified using the geometric series formula:

$ \sum_{i=0}^{n} a^i = \dfrac{a^{n+1} - 1}{a-1} $

to say that

$ \sum_{i=0}^{h} \dfrac{1}{2^i} = 2 \left( 1 - \dfrac{1}{2^{h+1}} \right) $

This number is bounded by 2, and therefore

$ \mathbb{E}(N) = \sum_{i=0}^{h} \mathbb{E}(N_i) \leq 2n $

and therefore the expected number of nodes in the skip list is O(n):

$ \mathbb{E}(N) \sim O(n) $

OOP Principles

Key object-oriented design principles employed in maps and sets:

  • Template Method Pattern - Define algorithm skeleton in base class, defer implementation details to subclasses. Used extensively in map ADTs where traversal logic is shared but comparison operations are type-specific.
  • Iterator Pattern - Provide a uniform way to traverse elements without exposing the underlying representation (array, tree, skip list, or hash table).
  • Composition - Data structures compose lower-level building blocks (e.g., a hash map composes a bucket array; a skip list composes linked lists at multiple levels).
  • Encapsulation - Internal representation (array, tree nodes, hash function parameters) is hidden behind the public ADT interface.
  • Abstract Base Classes - Map, Set, and Sorted Map ADTs are defined as abstract base classes with virtual/abstract methods that concrete implementations must override.

Flags