# Maps and Sets Study Guide

### From charlesreid1

## Contents

# Definitions and Variations

## Definitions

**dictionary** - data structure in which values are mapped to keys, or act as their own keys.

**dictionary problem** - a find/insert/delete lookup problem in which you learn nothing if your search fails

**map** - mathematical analog term for dictionary, like a function (one input gives one output)

**sorted map** - data structure storing keys in a sorted order to provide data in order

**hash table** - a fast, unordered lookup table that works by turning an object (key or value) into a unique slot in an array via a hashing function

**hash code** - maps keys to integers

**compression function** - maps integers to a set of slots 0..N-1

**bucket array** - array storing the data of the hash table

**hash function** - a function h that maps keys k to an integer between 0 and N-1, denoted h(k)

**immutability** - a property required of keys

**collisions** - when two unique/distinct objects have the same output of the has function

**chaining** - technique to deal with collisions

**exact search** - lookups tell you nothing about nearby data

**inexact search** - (timestamps example) can use sorting to get context info, less than, greater than, subset, etc.

**maxima sets** - storing (x,y) points used to model a cost/performance tradeoff. (a,b) dominates (c,d) if a <= c and b >= d. (a,b) is maximum pair if not dominated by any other pairs.

**skip lists** - data structure for storing data in sorted order, high probability of O(log N) access, simpler to implement relative to balanced search trees

- motivated by inefficiency of sorted lists
- efficient search/update compromise
- randomized data structure gives O(log N) with high expectation, randomizing DS so keys don't have to be randomized

**height of skip list** - number of linked lists to maintain

**pseudo-random number generators** - generate random-like numbers

**start position of skip list** - left-most node of top-most sparsest list

**frozen set** - an immutable set (Python type)

**set** - collection of related entities that does not allow duplicates, allows membership tests

**multiset** - set-like container allowing duplicates (counting map)

**multimap** - associates keys to values, but allows one key mapping to multiple values

**template method pattern** - virtual or unimplemented methods in parent class that are extended by children or by algorithms/functionality

## Variations

Maps:

- Sorted Map
- Tree-based
- Array-based
- Skip list

- Unsorted Map
- Hash table based
- Array based

Sets:

- Sorted sets
- Tree implementation
- Skip list implementation
- Array implementation

- Unsorted sets
- Hash table implementation
- Array implementation

(NOTE: While the array implementations are inefficient for larger maps, for a small map like in a B-Tree, O(N) insertion is not a big deal if in return you get O(log N) or O(1) lookup and/or if you have a map with a max of 4 items.)

Skip lists

Hash tables

# ADTs and Interfaces

## Map ADT

Map ADT:

- get
- set
- delete
- contains
- size
- empty
- iterator

Utilities:

- key iterator/key set
- value iterator/value set
- clear
- equality checks

Sorted Map ADT:

- find min
- find max
- find less than/lte
- find greater than/gte
- find range
- iterator
- reversed

## Hash Table ADT

Hash table ADT:

- get
- set
- delete
- resize
- hash function: ak+b mod p mod N

## Skip Lists

Skip list interface:

- Same as map
- get
- set
- delete
- contains

## Set ADT

Set ADT has 5 essential methods:

- add
- remove
- contains
- size
- empty

From these 5, others can be extended:

- pop
- equals/not equals
- lt/lte
- gt/gte
- union of sets
- intersection of sets
- symmetric difference
- not operator

# Implementations

Map:

- Item class:
- Key/value
- equal/not equal/lt/gt

Sorted Map:

- private find index (k, lo, hi)
- public get/set/delete
- iterator/reverse iterator
- find max/min
- find gt/gte/lt/lte

Table Map:

- array of data: table

Hash Map:

- Hash function
- private get/set/remove from buckets
- table data
- prime integer
- n entries integer
- hash parameters a, b integer
- resize

Set:

- follow map implementation, but values are their own keys

Skip List:

- Skip Node class:
- level of skip node integer
- data
- Array or list of next Skip Node, indexed by level
- Comparison operators

- random number generator
- max height integer
- size integer
- head node pointer

# Algorithms for Operations

Table Maps:

- Obvious, but impractical for large maps. Not detailed here.

Hash Maps:

- All operations are covered by Hash Tables Study Guide operations section

Tree Maps:

- All operations here are covered in the search tree study guide

## Skip Lists

add(e) method:

- deal with empty case (new head node)
- get random insertion level (flipping coin)
- create new node
- walker = head
- handle case where node goes before head
- for each level,
- walker = walker.next
- while next node smaller than us:
- walk fwd 1 step

- if current level <= insertion level:
- insert node in this level
- walker = walker.next

- increment size

remove method:

- deal with empty case
- check for remove head (head value = remove value)
- step 1
- step 2
- step 3
- update size

remove step 1: get predecessor

- use protected getPredecessor method
- find predecessor of node to be removed
- predecessor.getNext(using level from prior step), gives result node

remove step 2:

- to find result next node,
- deal with removing head case:
- prior is null
- swap out result with result prior
- result next = result.next

remove step 3:

- starting from top occurring level, remove each node in the tower
- result = result.getNext
- result prev next = result next
- if we aren't at last level of tower,
- move down
- set results previous at new level

getPredecessor method:

- handle head case
- look for the node (and level) where next nod has target value
- if we reach end of row, move down
- if we find a larger node, stop moving right and move down (move down, then walker = walker.next)
- if we are bigger than the next node, keep moving right
- if we are equal, return this node.

# Complexity and Cost

## Big O Complexity Table

Complexity depends on implementation. See Hash Tables Study Guide and Search Trees Study Guide for big O tables.

## Analysis of Skip Lists

### Big O Complexity

Big-O Complexity of Skip Lists | |||
---|---|---|---|

Skip List | |||

get | O(log N) | ||

set | O(log N) | ||

contains | O(log N) | ||

remove | O(log N) | ||

find min | O(1) | ||

find max | O(1) | ||

find lt/find lte/find gt/find gte | O(log N) | ||

find range | O(log N + S), S items returned | ||

iterate over all items | O(N) |

### Probabilistic Analysis of Performance

Performance analysis:

- Base it on
**expectation** - High likelihood of good performance, given by Chernoff bounds
- Bounded height with high probability

at a given level i, what is the probability of having an entry? 1/2, from coin flip. These combine to give the expression above.

Given a constant c > 1, the height of a tree is larger than c log n with probability of **at most**

The probability that the height is smaller than c log n is

### Space Usage

We can analyze the space usage of a probabilistic data set by analyzing the expected sum of number of nodes. This is given by the sum of number of positions at level i:

The last term can be simplified using the fact that

to say that

This number is bounded by 2, and therefore

and therefore the expected number of nodes in the skip list is O(n):

# OOP Principles

# Flags

Maps and DictionariesSeries on Data Structures
Maps Map implementations: Maps/AbstractMap Dictionary implementations: Dictionaries/LinkedDict
Hash Maps/OOP Hash Maps/Dynamic Resizing Hash functions: Hash Functions Hash map implementations: Hash Maps/AbstractHashMap
Skip Lists Java implementations: SkipList
Sets
· Template:MapsFlagBase · e |