Latest revision as of 03:59, 8 June 2026

Definitions and Variations

Definitions

dictionary - data structure that stores key-value pairs, mapping each unique key to an associated value

dictionary problem - a find/insert/delete lookup problem in which you learn nothing if your search fails

map - mathematical analog term for dictionary, like a function (one input gives one output)

sorted map - data structure storing keys in a sorted order to provide data in order

hash table - a fast, unordered lookup table that works by turning a key into a slot in a bucket array via a hash function

hash code - maps keys to integers

compression function - maps integers to a set of slots 0..N-1

bucket array - array storing the data of the hash table

hash function - a function h that maps keys k to an integer between 0 and N-1, denoted h(k)

immutability - a property required of keys in hash-based data structures; an immutable object's state cannot be modified after creation, ensuring its hash code remains constant

collisions - when two unique/distinct objects have the same output of the hash function $h(k_i) = h(k_j), i \neq j$

chaining - technique to deal with collisions by storing colliding entries in a secondary data structure (e.g., linked list) at each bucket

exact search - lookups tell you nothing about nearby data

inexact search - (timestamps example) can use sorting to get context info, less than, greater than, subset, etc.

maxima sets - storing (x,y) points used to model a cost/performance tradeoff. (a,b) dominates (c,d) if a <= c and b >= d. (a,b) is maximum pair if not dominated by any other pairs.

skip lists - data structure for storing data in sorted order, high probability of O(log N) access, simpler to implement relative to balanced search trees

motivated by inefficiency of sorted lists
efficient search/update compromise
randomized data structure gives O(log N) with high expectation, randomizing DS so keys don't have to be randomized

height of skip list - number of linked lists to maintain

pseudo-random number generators - generate random-like numbers

start position of skip list - left-most node of top-most sparsest list

frozen set - an immutable set (Python type)

set - collection of related entities that does not allow duplicates, allows membership tests

multiset - set-like container allowing duplicates (counting map)

multimap - associates keys to values, but allows one key mapping to multiple values

template method pattern - a behavioral design pattern that defines the skeleton of an algorithm in a parent class method, deferring specific steps to subclasses through virtual/abstract methods

Variations

Maps:

Sorted Map
- Tree-based
- Array-based
- Skip list
Unsorted Map
- Hash table based
- Array based

Sets:

Sorted sets
- Tree implementation
- Skip list implementation
- Array implementation
Unsorted sets
- Hash table implementation
- Array implementation

(NOTE: While the array implementations are inefficient for larger maps, for a small map like in a B-Tree, O(N) insertion is not a big deal if in return you get O(log N) or O(1) lookup and/or if you have a map with a max of 4 items.)

Skip lists

Hash tables

See Hash Tables Study Guide

ADTs and Interfaces

Map ADT

Map ADT:

get
set
delete
contains
size
empty
iterator

Utilities:

key iterator/key set
value iterator/value set
clear
equality checks

Sorted Map ADT:

find min
find max
find less than/lte
find greater than/gte
find range
iterator
reversed

Hash Table ADT

Hash table ADT:

get
set
delete
resize
hash function: ak+b mod p mod N

Skip Lists

Skip list interface:

Same as map
get
set
delete
contains

Set ADT

Set ADT has 5 essential methods:

add
remove
contains
size
empty

From these 5, others can be extended:

pop
equals/not equals
lt/lte
gt/gte
union of sets
intersection of sets
symmetric difference
not operator

Implementations

Map:

Item class:
- Key/value
- equal/not equal/lt/gt

Sorted Map:

private find index (k, lo, hi)
public get/set/delete
iterator/reverse iterator
find max/min
find gt/gte/lt/lte

Table Map:

array of data: table

Hash Map:

Hash function
private get/set/remove from buckets
table data
prime integer
n entries integer
hash parameters a, b integer
resize

Set:

follow map implementation, but values are their own keys

Skip List:

Skip Node class:
- level of skip node integer
- data
- Array or list of next Skip Node, indexed by level
- Comparison operators
random number generator
max height integer
size integer
head node pointer

Algorithms for Operations

Table Maps:

Obvious, but impractical for large maps. Not detailed here.

Hash Maps:

All operations are covered by Hash Tables Study Guide operations section

Tree Maps:

All operations here are covered in the search tree study guide

Skip Lists

add(e) method:

deal with empty case (new head node)
get random insertion level (flipping coin)
create new node
walker = head
handle case where node goes before head
for each level,
- walker = walker.next
- while next node smaller than us:
  - walk fwd 1 step
- if current level <= insertion level:
  - insert node in this level
  - walker = walker.next
increment size

remove method:

deal with empty case
check for remove head (head value = remove value)
step 1: get predecessor (use protected getPredecessor method to find predecessor of node to be removed; predecessor.getNext using level from prior step gives result node)
step 2: find result next node (deal with removing head case: prior is null, swap out result with result prior, result next = result.next)
step 3: starting from top occurring level, remove each node in the tower (result = result.getNext; result prev next = result next; if not at last level of tower, move down and set result's previous at new level)
update size

getPredecessor method:

handle head case
look for the node (and level) where next node has target value
if we reach end of row, move down
if we find a larger node, stop moving right and move down (move down, then walker = walker.next)
if we are bigger than the next node, keep moving right
if we are equal, return this node.

Complexity and Cost

Big O Complexity Table

Complexity depends on implementation. See Hash Tables Study Guide and Search Trees Study Guide for big O tables.

Analysis of Skip Lists

Big O Complexity

Big-O Complexity of Skip Lists
	<br /><br /> Skip List
get	O(log N)
set	O(log N)
contains	O(log N)
remove	O(log N)
find min	O(1)
find max	O(1)
find lt/find lte/find gt/find gte	O(log N)
find range	O(log N + S), S items returned
iterate over all items	O(N)

Probabilistic Analysis of Performance

Performance analysis:

Base it on expectation
High likelihood of good performance, given by Chernoff bounds
Bounded height with high probability

The expected number of nodes at level i is given by:

$\mathbb{E}(N_i) \leq \left( \dfrac{1}{2} \right)^i n = \dfrac{n}{2^i}$

At a given level i, the probability of a node being promoted to that level is 1/2^i, from repeated coin flips. These combine to give the expression above.

Given a constant c > 1, the height of a skip list is larger than c log n with probability of at most

$\dfrac{1}{n^{c-1}}$

The probability that the height is smaller than c log n is

$\left( 1 - \dfrac{1}{n^{c-1}} \right)$

Space Usage

We can analyze the space usage of a probabilistic data structure by analyzing the expected sum of number of nodes. This is given by the sum of the expected number of nodes at level i:

$\sum_{i=0}^{h} n \mathbb{P}_i = n \left( \sum_{i=0}^{h} \dfrac{1}{2^i} \right)$

The last term can be simplified using the geometric series formula:

$\sum_{i=0}^{n} a^i = \dfrac{a^{n+1} - 1}{a-1}$

to say that

$\sum_{i=0}^{h} \dfrac{1}{2^i} = 2 \left( 1 - \dfrac{1}{2^{h+1}} \right)$

This number is bounded by 2, and therefore

$\mathbb{E}(N) = \sum_{i=0}^{h} \mathbb{E}(N_i) \leq 2n$

and therefore the expected number of nodes in the skip list is O(n):

$\mathbb{E}(N) \sim O(n)$

OOP Principles

Key object-oriented design principles employed in maps and sets:

Template Method Pattern - Define algorithm skeleton in base class, defer implementation details to subclasses. Used extensively in map ADTs where traversal logic is shared but comparison operations are type-specific.
Iterator Pattern - Provide a uniform way to traverse elements without exposing the underlying representation (array, tree, skip list, or hash table).
Composition - Data structures compose lower-level building blocks (e.g., a hash map composes a bucket array; a skip list composes linked lists at multiple levels).
Encapsulation - Internal representation (array, tree nodes, hash function parameters) is hidden behind the public ADT interface.
Abstract Base Classes - Map, Set, and Sorted Map ADTs are defined as abstract base classes with virtual/abstract methods that concrete implementations must override.

Flags

@@ Line 3: / Line 3: @@
 ==Definitions==
-'''dictionary''' - data structure in which values are mapped to keys, or act as their own keys.
+'''dictionary''' - data structure that stores key-value pairs, mapping each unique key to an associated value
 '''dictionary problem''' - a find/insert/delete lookup problem in which you learn nothing if your search fails
@@ Line 11: / Line 11: @@
 '''sorted map''' - data structure storing keys in a sorted order to provide data in order
-'''hash table''' - a fast, unordered lookup table that works by turning an object (key or value) into a unique slot in an array via a hashing function
+'''hash table''' - a fast, unordered lookup table that works by turning a key into a slot in a bucket array via a hash function
 '''hash code''' - maps keys to integers
@@ Line 21: / Line 21: @@
 '''hash function''' - a function h that maps keys k to an integer between 0 and N-1, denoted h(k)
-'''immutability''' - a property required of keys
+'''immutability''' - a property required of keys in hash-based data structures; an immutable object's state cannot be modified after creation, ensuring its hash code remains constant
-'''collisions''' - when two unique/distinct objects have the same output of the has function <math>h(k_i) = h(k_j), i \neq j</math>
+'''collisions''' - when two unique/distinct objects have the same output of the hash function <math>h(k_i) = h(k_j), i \neq j</math>
-'''chaining''' - technique to deal with collisions
+'''chaining''' - technique to deal with collisions by storing colliding entries in a secondary data structure (e.g., linked list) at each bucket
 '''exact search''' - lookups tell you nothing about nearby data
@@ Line 31: / Line 31: @@
 '''inexact search''' - (timestamps example) can use sorting to get context info, less than, greater than, subset, etc.
-'''maxima sets''' - storing (x,y) points used to model a cost/performance tradeoff. (a,b) dominates (c,d) if a <= c and b >= d. (a,b) is maximum pair if not dominated by any other pairs.
+'''maxima sets''' - storing (x,y) points used to model a cost/performance tradeoff. (a,b) dominates (c,d) if a &lt;= c and b &gt;= d. (a,b) is maximum pair if not dominated by any other pairs.
 '''skip lists''' - data structure for storing data in sorted order, high probability of O(log N) access, simpler to implement relative to balanced search trees
@@ Line 52: / Line 52: @@
 '''multimap''' - associates keys to values, but allows one key mapping to multiple values
-'''template method pattern''' - virtual or unimplemented methods in parent class that are extended by children or by algorithms/functionality
+'''template method pattern''' - a behavioral design pattern that defines the skeleton of an algorithm in a parent class method, deferring specific steps to subclasses through virtual/abstract methods
 ==Variations==
@@ Line 193: / Line 193: @@
 Hash Maps:
-* All operations are covered by [[Hash Table Study Guide]] operations section
+* All operations are covered by [[Hash Tables Study Guide]] operations section
 Tree Maps:
@@ Line 210: / Line 210: @@
 ** while next node smaller than us:
 *** walk fwd 1 step
-** if current level <= insertion level:
+** if current level &lt;= insertion level:
 *** insert node in this level
 *** walker = walker.next
@@ Line 218: / Line 218: @@
 * deal with empty case
 * check for remove head (head value = remove value)
-* step 1
+* step 1: get predecessor (use protected getPredecessor method to find predecessor of node to be removed; predecessor.getNext using level from prior step gives result node)
-* step 2
+* step 2: find result next node (deal with removing head case: prior is null, swap out result with result prior, result next = result.next)
-* step 3
+* step 3: starting from top occurring level, remove each node in the tower (result = result.getNext; result prev next = result next; if not at last level of tower, move down and set result's previous at new level)
 * update size
-remove step 1: get predecessor
-* use protected getPredecessor method
-* find predecessor of node to be removed
-* predecessor.getNext(using leel from prior step), gives result node
-remove step 2:
-* to find result next node,
-* deal with removing head case:
-** prior is null
-** swap out result with result prior
-** result next = result.next
-remove step 3:
-* starting from top occurring level, remove each node in the tower
-** result = result.getNext
-** result prev next = result next
-** if we aren't at last level of tower,
-*** move down
-*** set results previous at new level
 getPredecessor method:
 * handle head case
-* look for the node (and level) where next nod has target value
+* look for the node (and level) where next node has target value
 * if we reach end of row, move down
 * if we find a larger node, stop moving right and move down (move down, then walker = walker.next)
@@ Line 266: / Line 245: @@
 |-
 |
-|<br /><br />
+|&lt;br /&gt;&lt;br /&gt;
 Skip List
@@ Line 274: / Line 253: @@
 |-
-|sett
+|set
 |O(log N)
@@ Line 284: / Line 263: @@
 |remove
 |O(log N)
+|-
+|find min
+|O(1)
+|-
+|find max
+|O(1)
+|-
+|find lt/find lte/find gt/find gte
+|O(log N)
+|-
+|find range
+|O(log N + S), S items returned
+|-
+|iterate over all items
+|O(N)
 |}
+===Probabilistic Analysis of Performance===
+Performance analysis:
+* Base it on '''expectation'''
+* High likelihood of good performance, given by Chernoff bounds
+* Bounded height with high probability
+The expected number of nodes at level i is given by:
+<math>
+\mathbb{E}(N_i) \leq \left( \dfrac{1}{2} \right)^i n = \dfrac{n}{2^i}
+</math>
+At a given level i, the probability of a node being promoted to that level is 1/2^i, from repeated coin flips. These combine to give the expression above.
+Given a constant c &gt; 1, the height of a skip list is larger than c log n with probability of '''at most'''
+<math>
+\dfrac{1}{n^{c-1}}
+</math>
+The probability that the height is smaller than c log n is
+<math>
+\left( 1 - \dfrac{1}{n^{c-1}} \right)
+</math>
 ===Space Usage===
-We can analyze the space usage of a probabilistic data set by analyzing the expected sum of number of nodes. This is given by the sum of number of positions at level i:
+We can analyze the space usage of a probabilistic data structure by analyzing the expected sum of number of nodes. This is given by the sum of the expected number of nodes at level i:
 <math>
@@ Line 295: / Line 321: @@
 </math>
-The last term can be simplified using the fact that
+The last term can be simplified using the geometric series formula:
 <math>
-\sum_{i=1}^{n} a^i = \dfrac{a^{i+1} - 1}{a-1}
+\sum_{i=0}^{n} a^i = \dfrac{a^{n+1} - 1}{a-1}
 </math>
@@ Line 310: / Line 336: @@
 <math>
-\mathbb{E}(N) = n \mathbb{P} = \sum_{i=0}^{h} n \mathbb{P}_i \leq 2n
+\mathbb{E}(N) = \sum_{i=0}^{h} \mathbb{E}(N_i) \leq 2n
 </math>
@@ Line 317: / Line 343: @@
 <math>
 \mathbb{E}(N) \sim O(n)
-<math>
+</math>
 =OOP Principles=
+Key object-oriented design principles employed in maps and sets:
+* '''Template Method Pattern''' - Define algorithm skeleton in base class, defer implementation details to subclasses. Used extensively in map ADTs where traversal logic is shared but comparison operations are type-specific.
+* '''Iterator Pattern''' - Provide a uniform way to traverse elements without exposing the underlying representation (array, tree, skip list, or hash table).
+* '''Composition''' - Data structures compose lower-level building blocks (e.g., a hash map composes a bucket array; a skip list composes linked lists at multiple levels).
+* '''Encapsulation''' - Internal representation (array, tree nodes, hash function parameters) is hidden behind the public ADT interface.
+* '''Abstract Base Classes''' - Map, Set, and Sorted Map ADTs are defined as abstract base classes with virtual/abstract methods that concrete implementations must override.
 =Flags=

Maps and Sets Study Guide: Difference between revisions

From charlesreid1

Latest revision as of 03:59, 8 June 2026

Contents

Definitions and Variations

Definitions

Variations

ADTs and Interfaces

Map ADT

Hash Table ADT

Skip Lists

Set ADT

Implementations

Algorithms for Operations

Skip Lists

Complexity and Cost

Big O Complexity Table

Analysis of Skip Lists

Big O Complexity

Probabilistic Analysis of Performance

Space Usage

OOP Principles

Flags