Maps and Sets Study Guide
From charlesreid1
Contents
Definitions and Variations
Definitions
dictionary - data structure in which values are mapped to keys, or act as their own keys.
dictionary problem - a find/insert/delete lookup problem in which you learn nothing if your search fails
map - mathematical analog term for dictionary, like a function (one input gives one output)
sorted map - data structure storing keys in a sorted order to provide data in order
hash table - a fast, unordered lookup table that works by turning an object (key or value) into a unique slot in an array via a hashing function
hash code - maps keys to integers
compression function - maps integers to a set of slots 0..N-1
bucket array - array storing the data of the hash table
hash function - a function h that maps keys k to an integer between 0 and N-1, denoted h(k)
immutability - a property required of keys
collisions - when two unique/distinct objects have the same output of the has function
chaining - technique to deal with collisions
exact search - lookups tell you nothing about nearby data
inexact search - (timestamps example) can use sorting to get context info, less than, greater than, subset, etc.
maxima sets - storing (x,y) points used to model a cost/performance tradeoff. (a,b) dominates (c,d) if a <= c and b >= d. (a,b) is maximum pair if not dominated by any other pairs.
skip lists - data structure for storing data in sorted order, high probability of O(log N) access, simpler to implement relative to balanced search trees
- motivated by inefficiency of sorted lists
- efficient search/update compromise
- randomized data structure gives O(log N) with high expectation, randomizing DS so keys don't have to be randomized
height of skip list - number of linked lists to maintain
pseudo-random number generators - generate random-like numbers
start position of skip list - left-most node of top-most sparsest list
frozen set - an immutable set (Python type)
set - collection of related entities that does not allow duplicates, allows membership tests
multiset - set-like container allowing duplicates (counting map)
multimap - associates keys to values, but allows one key mapping to multiple values
template method pattern - virtual or unimplemented methods in parent class that are extended by children or by algorithms/functionality
Variations
Maps:
- Sorted Map
- Tree-based
- Array-based
- Skip list
- Unsorted Map
- Hash table based
- Array based
Sets:
- Sorted sets
- Tree implementation
- Skip list implementation
- Array implementation
- Unsorted sets
- Hash table implementation
- Array implementation
(NOTE: While the array implementations are inefficient for larger maps, for a small map like in a B-Tree, O(N) insertion is not a big deal if in return you get O(log N) or O(1) lookup and/or if you have a map with a max of 4 items.)
Skip lists
Hash tables
ADTs and Interfaces
Map ADT
Map ADT:
- get
- set
- delete
- contains
- size
- empty
- iterator
Utilities:
- key iterator/key set
- value iterator/value set
- clear
- equality checks
Sorted Map ADT:
- find min
- find max
- find less than/lte
- find greater than/gte
- find range
- iterator
- reversed
Hash Table ADT
Hash table ADT:
- get
- set
- delete
- resize
- hash function: ak+b mod p mod N
Skip Lists
Skip list interface:
- Same as map
- get
- set
- delete
- contains
Set ADT
Set ADT has 5 essential methods:
- add
- remove
- contains
- size
- empty
From these 5, others can be extended:
- pop
- equals/not equals
- lt/lte
- gt/gte
- union of sets
- intersection of sets
- symmetric difference
- not operator
Implementations
Map:
- Item class:
- Key/value
- equal/not equal/lt/gt
Sorted Map:
- private find index (k, lo, hi)
- public get/set/delete
- iterator/reverse iterator
- find max/min
- find gt/gte/lt/lte
Table Map:
- array of data: table
Hash Map:
- Hash function
- private get/set/remove from buckets
- table data
- prime integer
- n entries integer
- hash parameters a, b integer
- resize
Set:
- follow map implementation, but values are their own keys
Skip List:
- Skip Node class:
- level of skip node integer
- data
- Array or list of next Skip Node, indexed by level
- Comparison operators
- random number generator
- max height integer
- size integer
- head node pointer
Algorithms for Operations
Table Maps:
- Obvious, but impractical for large maps. Not detailed here.
Hash Maps:
- All operations are covered by Hash Tables Study Guide operations section
Tree Maps:
- All operations here are covered in the search tree study guide
Skip Lists
add(e) method:
- deal with empty case (new head node)
- get random insertion level (flipping coin)
- create new node
- walker = head
- handle case where node goes before head
- for each level,
- walker = walker.next
- while next node smaller than us:
- walk fwd 1 step
- if current level <= insertion level:
- insert node in this level
- walker = walker.next
- increment size
remove method:
- deal with empty case
- check for remove head (head value = remove value)
- step 1
- step 2
- step 3
- update size
remove step 1: get predecessor
- use protected getPredecessor method
- find predecessor of node to be removed
- predecessor.getNext(using level from prior step), gives result node
remove step 2:
- to find result next node,
- deal with removing head case:
- prior is null
- swap out result with result prior
- result next = result.next
remove step 3:
- starting from top occurring level, remove each node in the tower
- result = result.getNext
- result prev next = result next
- if we aren't at last level of tower,
- move down
- set results previous at new level
getPredecessor method:
- handle head case
- look for the node (and level) where next nod has target value
- if we reach end of row, move down
- if we find a larger node, stop moving right and move down (move down, then walker = walker.next)
- if we are bigger than the next node, keep moving right
- if we are equal, return this node.
Complexity and Cost
Big O Complexity Table
Complexity depends on implementation. See Hash Tables Study Guide and Search Trees Study Guide for big O tables.
Analysis of Skip Lists
Big O Complexity
Big-O Complexity of Skip Lists | |||
---|---|---|---|
Skip List | |||
get | O(log N) | ||
set | O(log N) | ||
contains | O(log N) | ||
remove | O(log N) | ||
find min | O(1) | ||
find max | O(1) | ||
find lt/find lte/find gt/find gte | O(log N) | ||
find range | O(log N + S), S items returned | ||
iterate over all items | O(N) |
Probabilistic Analysis of Performance
Performance analysis:
- Base it on expectation
- High likelihood of good performance, given by Chernoff bounds
- Bounded height with high probability
at a given level i, what is the probability of having an entry? 1/2, from coin flip. These combine to give the expression above.
Given a constant c > 1, the height of a tree is larger than c log n with probability of at most
The probability that the height is smaller than c log n is
Space Usage
We can analyze the space usage of a probabilistic data set by analyzing the expected sum of number of nodes. This is given by the sum of number of positions at level i:
The last term can be simplified using the fact that
to say that
This number is bounded by 2, and therefore
and therefore the expected number of nodes in the skip list is O(n):
OOP Principles
Flags
Maps and Dictionaries Part of Computer Science Notes
Series on Data Structures
Maps/Dictionaries Maps · Maps/ADT · Maps in Java · Maps/OOP · Maps/Operations and Performance Map implementations: Maps/AbstractMap · Maps/UnsortedArrayMap · Maps/SortedArrayMap Dictionary implementations: Dictionaries/LinkedDict · Dictionaries/ArrayDict
Hashes Hash Maps/OOP · Hash Maps/Operations and Performance Hash Maps/Dynamic Resizing · Hash Maps/Collision Handling with Chaining Hash functions: Hash Functions · Hash Functions/Cyclic Permutation Hash map implementations: Hash Maps/AbstractHashMap · Hash Maps/ChainedHashMap
Skip Lists · Java/ConcurrentSkipList · Java implementations: SkipList
Sets Sets · Sets/ADT · Sets in Java · Sets/OOP · Multisets
|