Notes

Using the term "map" in place of the term "dictionary" is a hat tip to the mathematical nature of a map. The mathematical definition of a function is, a mapping from one set (the domain) onto another (the range) such that one input corresponds to exactly one output. This is a bijective mapping between two sets. This is all that is required of a function - one input gives one output. Likewise, a programming map is define as a key-value data structure, in which an input key can be used to find the resulting value in O(1) time.

Maps can be implemented multiple ways. They can be stored as a simple unsorted array of key-value item objects. They can be stored in a sorted array of key-value items. They can be stored with a tree. They can be stored with a hash table. etc.

While these data structures are nominally a little bit different from Dictionaries - dictionaries return positions in a data structure given an input object x) - they are ultimately a generalization of them (the position can be thought of as the value stored by the map, while the input object x is the key) and therefore identical.

Good MIT lecture on hashing with chaining: https://www.youtube.com/watch?v=0M_kIqhwbFo

The Motivation

With a map or dictionary, our goal is to create a data structure that can be searched in O(1) time.

The simplest possible map, if we could design the perfect map, would simply have keys that were integers 0 to N-1, and the values stored in the array at the index of its key. This would give O(1) lookup: the user says "give me item corresponding to key 1" and programming language says "here is array[1]". This is great, because by using an array and using the key as the index of the array, we have made everything O(1) fast. This is not so great, because there will be huge memory problems, not to mention, we didn't decide how to assign keys.

Problems to resolve:

1. Real keys may not be small non-negative integers

2. Potentially gigantic memory hog (if we want to store two items, one with key 2 and one with key 65,110,245,336,002

Prehash - assigning integers to objects

Assigning unique keys to items that are not necessarily nonnegative integers can be solved by solving the reverse problem: turn arbitrary objects (keys) into nonnegative integers.

We can argue that this is "easy" from a theoretical standpoint:

Any item in memory can be represented (written down) as a sequence of 0's and 1's (existential statement about objects/variables/data)
0s and 1s are numbers, in binary, so we can convert them to numbers, in decimal.

(Note the similarities between this approach and Gödel's approach with his Incompleteness Theorem of assigning numbers to mathematical proofs/theories.)

This gives us a way of turning arbitrary objects/data into arbitrary integers.

hash('\0B') == hash'\0\0C') == 64

Ideally: hash(x) = hash(y) <==> x = y

Python details:

hash(x) is the prehash of x
Can use __hash__ builtin to adjust the way the hash is computed
Challenges with adding mutable items to dictionaries - your hash function must not change over time (Python cannot protect you from that)
You cannot use a MUTABLE LIST as a key value in a Python dictionary, because it can potentially change over time

Hashes and lists and Python

Here's an interview question that requires packing up a lot of knowledge into a single answer: "Why doesn't the following code work?"

a = [1,2,3]
b = [4,5,6]
d = {}
d[a] = 'The first list'
d[b] = 'The second list'

Why doesn't this work?

Doesn't work because lists are mutable.
Basics of dictionaries in Python: they are key-value data structures where the key can be any type we'd like.
If we pass a mutable type as a key, we are going to change the input to the hash table/map, and therefore change the output location, and therefore lose our data.
To prevent this, need to define an immutable list, or use an immutable object, or define a __hash__ function for any key types that we use that will return the same hash regardless of content.
Example: we might pass a customer object to a database application. The customer may change (e.g., phone number, address, etc.), but this should not affect hash of object, otherwise looks like entirely new "Customer". Could luse customer ID only when computing hash function.

Solving memory hog issue

To solve the issue of hash tables being memory hogs, we have a second step that we call hashing. (Prehashing is turning keys into numbers; hashing is turning numbers into tinier sets of numbers.) Like turning a potato into hash browns. Like turning tofu into tofu crumble. Like turning large boulders into sand.

Arabic hash - grass

German hash - hashae - hatchet

Want to map a giant key space to a hash table with slots 0 to n-1. Need a hash function that will turn key 0 into position 0, key 1 into position 1, etc.

Idea: $m=\Theta (n)$

where n is number of keys being stored in dictionary. This is the big challenge. We solve it using hash functions.

Hash function:

$h=\mathbb {U} \rightarrow \mathbb {K} ,{\mbox{ where }}\mathbb {K} =set(0,1,\dots ,n-1)$

Collision Handling with Chaining

Main article: Hash Maps/Collision Handling with Chaining

Hash Table Expansion

Main article: Hash Maps/Dynamic Resizing

Maps ADT

See Maps/ADT for abstract data type/interface specification for Maps.

See Java API documentation for interface class Map: http://docs.oracle.com/javase/8/docs/api/java/util/Map.html

Example: word counts

A canonical example of maps is performing word counts for text files. Each word is added to the dictionary as a key, and the word count is the value. Keys are added with a default key value of 1, and if a word already exists as a key its value is incremented.

Pseudocode:

create empty map
for each word in input file:
    if(word in map keys):
        increment value at key word
    else:
        add new pair (word, 1) to map

Map classification

Map types are organized as follows:

Unsorted maps
Hash maps
Sorted maps

See implementations below.

Map Implementations

For implementations of maps, see the following:

Map abstract data type:

Maps/ADT - Map interface class
Maps/AbstractMap - AbstractMap abstract class

Map implementations using arrays:

Maps/UnsortedArrayMap - concrete implementation of Map using unsorted ArrayList
Maps/SortedArrayMap - concrete implementation of Map using sorted ArrayList

Map implementations using hash tables:

Hash Maps/AbstractHashMap - abstract class defining further details of implementing a map using a hash table
Hash Maps/ChainHashMap - concrete implementation of hash map using a chained hash map
Hash Maps/ProbeHashMap - concrete implementation of hash map using linear probing.

Flags

Maps

From charlesreid1

Contents