Latest revision as of 03:59, 8 June 2026

Definitions and Variations

Definitions

dictionary - data structure that stores key-value pairs, mapping each unique key to an associated value

dictionary problem - a find/insert/delete lookup problem in which you learn nothing if your search fails

map - mathematical analog term for dictionary, like a function (one input gives one output)

sorted map - data structure storing keys in a sorted order to provide data in order

hash table - a fast, unordered lookup table that works by turning a key into a slot in a bucket array via a hash function

hash code - maps keys to integers

compression function - maps integers to a set of slots 0..N-1

bucket array - array storing the data of the hash table

hash function - a function h that maps keys k to an integer between 0 and N-1, denoted h(k)

immutability - a property required of keys in hash-based data structures; an immutable object's state cannot be modified after creation, ensuring its hash code remains constant

collisions - when two unique/distinct objects have the same output of the hash function $h(k_i) = h(k_j), i \neq j$

chaining - technique to deal with collisions by storing colliding entries in a secondary data structure (e.g., linked list) at each bucket

exact search - lookups tell you nothing about nearby data

inexact search - (timestamps example) can use sorting to get context info, less than, greater than, subset, etc.

maxima sets - storing (x,y) points used to model a cost/performance tradeoff. (a,b) dominates (c,d) if a <= c and b >= d. (a,b) is maximum pair if not dominated by any other pairs.

skip lists - data structure for storing data in sorted order, high probability of O(log N) access, simpler to implement relative to balanced search trees

motivated by inefficiency of sorted lists
efficient search/update compromise
randomized data structure gives O(log N) with high expectation, randomizing DS so keys don't have to be randomized

height of skip list - number of linked lists to maintain

pseudo-random number generators - generate random-like numbers

start position of skip list - left-most node of top-most sparsest list

frozen set - an immutable set (Python type)

set - collection of related entities that does not allow duplicates, allows membership tests

multiset - set-like container allowing duplicates (counting map)

multimap - associates keys to values, but allows one key mapping to multiple values

template method pattern - a behavioral design pattern that defines the skeleton of an algorithm in a parent class method, deferring specific steps to subclasses through virtual/abstract methods

Variations

Maps:

Sorted Map
- Tree-based
- Array-based
- Skip list
Unsorted Map
- Hash table based
- Array based

Sets:

Sorted sets
- Tree implementation
- Skip list implementation
- Array implementation
Unsorted sets
- Hash table implementation
- Array implementation

(NOTE: While the array implementations are inefficient for larger maps, for a small map like in a B-Tree, O(N) insertion is not a big deal if in return you get O(log N) or O(1) lookup and/or if you have a map with a max of 4 items.)

Skip lists

Hash tables

See Hash Tables Study Guide

ADTs and Interfaces

Map ADT

Map ADT:

get
set
delete
contains
size
empty
iterator

Utilities:

key iterator/key set
value iterator/value set
clear
equality checks

Sorted Map ADT:

find min
find max
find less than/lte
find greater than/gte
find range
iterator
reversed

Hash Table ADT

Hash table ADT:

get
set
delete
resize
hash function: ak+b mod p mod N

Skip Lists

Skip list interface:

Same as map
get
set
delete
contains

Set ADT

Set ADT has 5 essential methods:

add
remove
contains
size
empty

From these 5, others can be extended:

pop
equals/not equals
lt/lte
gt/gte
union of sets
intersection of sets
symmetric difference
not operator

Implementations

Map:

Item class:
- Key/value
- equal/not equal/lt/gt

Sorted Map:

private find index (k, lo, hi)
public get/set/delete
iterator/reverse iterator
find max/min
find gt/gte/lt/lte

Table Map:

array of data: table

Hash Map:

Hash function
private get/set/remove from buckets
table data
prime integer
n entries integer
hash parameters a, b integer
resize

Set:

follow map implementation, but values are their own keys

Skip List:

Skip Node class:
- level of skip node integer
- data
- Array or list of next Skip Node, indexed by level
- Comparison operators
random number generator
max height integer
size integer
head node pointer

Algorithms for Operations

Table Maps:

Obvious, but impractical for large maps. Not detailed here.

Hash Maps:

All operations are covered by Hash Tables Study Guide operations section

Tree Maps:

All operations here are covered in the search tree study guide

Skip Lists

add(e) method:

deal with empty case (new head node)
get random insertion level (flipping coin)
create new node
walker = head
handle case where node goes before head
for each level,
- walker = walker.next
- while next node smaller than us:
  - walk fwd 1 step
- if current level <= insertion level:
  - insert node in this level
  - walker = walker.next
increment size

remove method:

deal with empty case
check for remove head (head value = remove value)
step 1: get predecessor (use protected getPredecessor method to find predecessor of node to be removed; predecessor.getNext using level from prior step gives result node)
step 2: find result next node (deal with removing head case: prior is null, swap out result with result prior, result next = result.next)
step 3: starting from top occurring level, remove each node in the tower (result = result.getNext; result prev next = result next; if not at last level of tower, move down and set result's previous at new level)
update size

getPredecessor method:

handle head case
look for the node (and level) where next node has target value
if we reach end of row, move down
if we find a larger node, stop moving right and move down (move down, then walker = walker.next)
if we are bigger than the next node, keep moving right
if we are equal, return this node.

Complexity and Cost

Big O Complexity Table

Complexity depends on implementation. See Hash Tables Study Guide and Search Trees Study Guide for big O tables.

Analysis of Skip Lists

Big O Complexity

Big-O Complexity of Skip Lists
	<br /><br /> Skip List
get	O(log N)
set	O(log N)
contains	O(log N)
remove	O(log N)
find min	O(1)
find max	O(1)
find lt/find lte/find gt/find gte	O(log N)
find range	O(log N + S), S items returned
iterate over all items	O(N)

Probabilistic Analysis of Performance

Performance analysis:

Base it on expectation
High likelihood of good performance, given by Chernoff bounds
Bounded height with high probability

The expected number of nodes at level i is given by:

$\mathbb{E}(N_i) \leq \left( \dfrac{1}{2} \right)^i n = \dfrac{n}{2^i}$

At a given level i, the probability of a node being promoted to that level is 1/2^i, from repeated coin flips. These combine to give the expression above.

Given a constant c > 1, the height of a skip list is larger than c log n with probability of at most

$\dfrac{1}{n^{c-1}}$

The probability that the height is smaller than c log n is

$\left( 1 - \dfrac{1}{n^{c-1}} \right)$

Space Usage

We can analyze the space usage of a probabilistic data structure by analyzing the expected sum of number of nodes. This is given by the sum of the expected number of nodes at level i:

$\sum_{i=0}^{h} n \mathbb{P}_i = n \left( \sum_{i=0}^{h} \dfrac{1}{2^i} \right)$

The last term can be simplified using the geometric series formula:

$\sum_{i=0}^{n} a^i = \dfrac{a^{n+1} - 1}{a-1}$

to say that

$\sum_{i=0}^{h} \dfrac{1}{2^i} = 2 \left( 1 - \dfrac{1}{2^{h+1}} \right)$

This number is bounded by 2, and therefore

$\mathbb{E}(N) = \sum_{i=0}^{h} \mathbb{E}(N_i) \leq 2n$

and therefore the expected number of nodes in the skip list is O(n):

$\mathbb{E}(N) \sim O(n)$

OOP Principles

Key object-oriented design principles employed in maps and sets:

Template Method Pattern - Define algorithm skeleton in base class, defer implementation details to subclasses. Used extensively in map ADTs where traversal logic is shared but comparison operations are type-specific.
Iterator Pattern - Provide a uniform way to traverse elements without exposing the underlying representation (array, tree, skip list, or hash table).
Composition - Data structures compose lower-level building blocks (e.g., a hash map composes a bucket array; a skip list composes linked lists at multiple levels).
Encapsulation - Internal representation (array, tree nodes, hash function parameters) is hidden behind the public ADT interface.
Abstract Base Classes - Map, Set, and Sorted Map ADTs are defined as abstract base classes with virtual/abstract methods that concrete implementations must override.

Flags

Maps and Sets Study Guide: Difference between revisions

From charlesreid1

Latest revision as of 03:59, 8 June 2026

Contents

Definitions and Variations

Definitions

Variations

ADTs and Interfaces

Map ADT

Hash Table ADT

Skip Lists

Set ADT

Implementations

Algorithms for Operations

Skip Lists

Complexity and Cost

Big O Complexity Table

Analysis of Skip Lists

Big O Complexity

Probabilistic Analysis of Performance

Space Usage

OOP Principles

Flags

@@ Line 3: / Line 3: @@
 ==Definitions==
-'''dictionary''' - data structure in which values are mapped to keys, or act as their own keys.
+'''dictionary''' - data structure that stores key-value pairs, mapping each unique key to an associated value
 '''dictionary problem''' - a find/insert/delete lookup problem in which you learn nothing if your search fails
@@ Line 11: / Line 11: @@
 '''sorted map''' - data structure storing keys in a sorted order to provide data in order
-'''hash table''' - a fast, unordered lookup table that works by turning an object (key or value) into a unique slot in an array via a hashing function
+'''hash table''' - a fast, unordered lookup table that works by turning a key into a slot in a bucket array via a hash function
 '''hash code''' - maps keys to integers
@@ Line 21: / Line 21: @@
 '''hash function''' - a function h that maps keys k to an integer between 0 and N-1, denoted h(k)
-'''immutability''' - a property required of keys
+'''immutability''' - a property required of keys in hash-based data structures; an immutable object's state cannot be modified after creation, ensuring its hash code remains constant
-'''collisions''' - when two unique/distinct objects have the same output of the has function <math>h(k_i) = h(k_j), i \neq j</math>
+'''collisions''' - when two unique/distinct objects have the same output of the hash function <math>h(k_i) = h(k_j), i \neq j</math>
-'''chaining''' - technique to deal with collisions
+'''chaining''' - technique to deal with collisions by storing colliding entries in a secondary data structure (e.g., linked list) at each bucket
 '''exact search''' - lookups tell you nothing about nearby data
@@ Line 31: / Line 31: @@
 '''inexact search''' - (timestamps example) can use sorting to get context info, less than, greater than, subset, etc.
-'''maxima sets''' - storing (x,y) points used to model a cost/performance tradeoff. (a,b) dominates (c,d) if a <= c and b >= d. (a,b) is maximum pair if not dominated by any other pairs.
+'''maxima sets''' - storing (x,y) points used to model a cost/performance tradeoff. (a,b) dominates (c,d) if a &lt;= c and b &gt;= d. (a,b) is maximum pair if not dominated by any other pairs.
 '''skip lists''' - data structure for storing data in sorted order, high probability of O(log N) access, simpler to implement relative to balanced search trees
+* motivated by inefficiency of sorted lists
+* efficient search/update compromise
+* randomized data structure gives O(log N) with high expectation, randomizing DS so keys don't have to be randomized
+'''height of skip list''' - number of linked lists to maintain
+'''pseudo-random number generators''' - generate random-like numbers
+'''start position of skip list''' - left-most node of top-most sparsest list
+'''frozen set''' - an immutable set (Python type)
+'''set''' - collection of related entities that does not allow duplicates, allows membership tests
+'''multiset''' - set-like container allowing duplicates (counting map)
+'''multimap''' - associates keys to values, but allows one key mapping to multiple values
+'''template method pattern''' - a behavioral design pattern that defines the skeleton of an algorithm in a parent class method, deferring specific steps to subclasses through virtual/abstract methods
+==Variations==
+Maps:
+* Sorted Map
+** Tree-based
+** Array-based
+** Skip list
+* Unsorted Map
+** Hash table based
+** Array based
+Sets:
+* Sorted sets
+** Tree implementation
+** Skip list implementation
+** Array implementation
+* Unsorted sets
+** Hash table implementation
+** Array implementation
+(NOTE: While the array implementations are inefficient for larger maps, for a small map like in a B-Tree, O(N) insertion is not a big deal if in return you get O(log N) or O(1) lookup and/or if you have a map with a max of 4 items.)
+Skip lists
+Hash tables
+* See [[Hash Tables Study Guide]]
 =ADTs and Interfaces=
+==Map ADT==
+Map ADT:
+* get
+* set
+* delete
+* contains
+* size
+* empty
+* iterator
+Utilities:
+* key iterator/key set
+* value iterator/value set
+* clear
+* equality checks
+Sorted Map ADT:
+* find min
+* find max
+* find less than/lte
+* find greater than/gte
+* find range
+* iterator
+* reversed
+==Hash Table ADT==
+Hash table ADT:
+* get
+* set
+* delete
+* resize
+* hash function: ak+b mod p mod N
+==Skip Lists==
+Skip list interface:
+* Same as map
+* get
+* set
+* delete
+* contains
+==Set ADT==
+Set ADT has 5 essential methods:
+* add
+* remove
+* contains
+* size
+* empty
+From these 5, others can be extended:
+* pop
+* equals/not equals
+* lt/lte
+* gt/gte
+* union of sets
+* intersection of sets
+* symmetric difference
+* not operator
 =Implementations=
+Map:
+* Item class:
+** Key/value
+** equal/not equal/lt/gt
+Sorted Map:
+* private find index (k, lo, hi)
+* public get/set/delete
+* iterator/reverse iterator
+* find max/min
+* find gt/gte/lt/lte
+Table Map:
+* array of data: table
+Hash Map:
+* Hash function
+* private get/set/remove from buckets
+* table data
+* prime integer
+* n entries integer
+* hash parameters a, b integer
+* resize
+Set:
+* follow map implementation, but values are their own keys
+Skip List:
+* Skip Node class:
+** level of skip node integer
+** data
+** Array or list of next Skip Node, indexed by level
+** Comparison operators
+* random number generator
+* max height integer
+* size integer
+* head node pointer
 =Algorithms for Operations=
+Table Maps:
+* Obvious, but impractical for large maps. Not detailed here.
+Hash Maps:
+* All operations are covered by [[Hash Tables Study Guide]] operations section
+Tree Maps:
+* All operations here are covered in the search tree study guide
+==Skip Lists==
+add(e) method:
+* deal with empty case (new head node)
+* get random insertion level (flipping coin)
+* create new node
+* walker = head
+* handle case where node goes before head
+* for each level,
+** walker = walker.next
+** while next node smaller than us:
+*** walk fwd 1 step
+** if current level &lt;= insertion level:
+*** insert node in this level
+*** walker = walker.next
+* increment size
+remove method:
+* deal with empty case
+* check for remove head (head value = remove value)
+* step 1: get predecessor (use protected getPredecessor method to find predecessor of node to be removed; predecessor.getNext using level from prior step gives result node)
+* step 2: find result next node (deal with removing head case: prior is null, swap out result with result prior, result next = result.next)
+* step 3: starting from top occurring level, remove each node in the tower (result = result.getNext; result prev next = result next; if not at last level of tower, move down and set result's previous at new level)
+* update size
+getPredecessor method:
+* handle head case
+* look for the node (and level) where next node has target value
+* if we reach end of row, move down
+* if we find a larger node, stop moving right and move down (move down, then walker = walker.next)
+* if we are bigger than the next node, keep moving right
+* if we are equal, return this node.
 =Complexity and Cost=
+==Big O Complexity Table==
+Complexity depends on implementation. See [[Hash Tables Study Guide]] and [[Search Trees Study Guide]] for big O tables.
+==Analysis of Skip Lists==
+===Big O Complexity===
+{| class="wikitable" cellpadding="100" width="66%"
+!colspan="4"|Big-O Complexity of Skip Lists
+|-
+|
+|&lt;br /&gt;&lt;br /&gt;
+Skip List
+|-
+|get
+|O(log N)
+|-
+|set
+|O(log N)
+|-
+|contains
+|O(log N)
+|-
+|remove
+|O(log N)
+|-
+|find min
+|O(1)
+|-
+|find max
+|O(1)
+|-
+|find lt/find lte/find gt/find gte
+|O(log N)
+|-
+|find range
+|O(log N + S), S items returned
+|-
+|iterate over all items
+|O(N)
+|}
+===Probabilistic Analysis of Performance===
+Performance analysis:
+* Base it on '''expectation'''
+* High likelihood of good performance, given by Chernoff bounds
+* Bounded height with high probability
+The expected number of nodes at level i is given by:
+<math>
+\mathbb{E}(N_i) \leq \left( \dfrac{1}{2} \right)^i n = \dfrac{n}{2^i}
+</math>
+At a given level i, the probability of a node being promoted to that level is 1/2^i, from repeated coin flips. These combine to give the expression above.
+Given a constant c &gt; 1, the height of a skip list is larger than c log n with probability of '''at most'''
+<math>
+\dfrac{1}{n^{c-1}}
+</math>
+The probability that the height is smaller than c log n is
+<math>
+\left( 1 - \dfrac{1}{n^{c-1}} \right)
+</math>
+===Space Usage===
+We can analyze the space usage of a probabilistic data structure by analyzing the expected sum of number of nodes. This is given by the sum of the expected number of nodes at level i:
+<math>
+\sum_{i=0}^{h} n \mathbb{P}_i = n \left( \sum_{i=0}^{h} \dfrac{1}{2^i} \right)
+</math>
+The last term can be simplified using the geometric series formula:
+<math>
+\sum_{i=0}^{n} a^i = \dfrac{a^{n+1} - 1}{a-1}
+</math>
+to say that
+<math>
+\sum_{i=0}^{h} \dfrac{1}{2^i} = 2 \left( 1 - \dfrac{1}{2^{h+1}} \right)
+</math>
+This number is bounded by 2, and therefore
+<math>
+\mathbb{E}(N) = \sum_{i=0}^{h} \mathbb{E}(N_i) \leq 2n
+</math>
+and therefore the expected number of nodes in the skip list is O(n):
+<math>
+\mathbb{E}(N) \sim O(n)
+</math>
 =OOP Principles=
+Key object-oriented design principles employed in maps and sets:
+* '''Template Method Pattern''' - Define algorithm skeleton in base class, defer implementation details to subclasses. Used extensively in map ADTs where traversal logic is shared but comparison operations are type-specific.
+* '''Iterator Pattern''' - Provide a uniform way to traverse elements without exposing the underlying representation (array, tree, skip list, or hash table).
+* '''Composition''' - Data structures compose lower-level building blocks (e.g., a hash map composes a bucket array; a skip list composes linked lists at multiple levels).
+* '''Encapsulation''' - Internal representation (array, tree nodes, hash function parameters) is hidden behind the public ADT interface.
+* '''Abstract Base Classes''' - Map, Set, and Sorted Map ADTs are defined as abstract base classes with virtual/abstract methods that concrete implementations must override.
 =Flags=