Maps
From charlesreid1
See also: Dictionaries
Maps are value-based data structures that create a bijection from one set onto another, such that one input key yields one corresponding output value.
Contents
Notes
Using the term "map" in place of the term "dictionary" is a hat tip to the mathematical nature of a map. The mathematical definition of a function is, a mapping from one set (the domain) onto another (the range) such that one input corresponds to exactly one output. This is a bijective mapping between two sets. This is all that is required of a function - one input gives one output. Likewise, a programming map is define as a key-value data structure, in which an input key can be used to find the resulting value in O(1) time.
Maps can be implemented multiple ways. They can be stored as a simple unsorted array of key-value item objects. They can be stored in a sorted array of key-value items. They can be stored with a tree. They can be stored with a hash table. etc.
While these data structures are nominally a little bit different from Dictionaries - dictionaries return positions in a data structure given an input object x) - they are ultimately a generalization of them (the position can be thought of as the value stored by the map, while the input object x is the key) and therefore identical.
Good MIT lecture on hashing with chaining: https://www.youtube.com/watch?v=0M_kIqhwbFo
The Motivation
With a map or dictionary, our goal is to create a data structure that can be searched in O(1) time.
The simplest possible map, if we could design the perfect map, would simply have keys that were integers 0 to N-1, and the values stored in the array at the index of its key. This would give O(1) lookup: the user says "give me item corresponding to key 1" and programming language says "here is array[1]". This is great, because by using an array and using the key as the index of the array, we have made everything O(1) fast. This is not so great, because there will be huge memory problems, not to mention, we didn't decide how to assign keys.
Problems to resolve:
1. Real keys may not be small non-negative integers
2. Potentially gigantic memory hog (if we want to store two items, one with key 2 and one with key 65,110,245,336,002
Prehash - assigning integers to objects
Assigning unique keys to items that are not necessarily nonnegative integers can be solved by solving the reverse problem: turn arbitrary objects (keys) into nonnegative integers.
We can argue that this is "easy" from a theoretical standpoint:
- Any item in memory can be represented (written down) as a sequence of 0's and 1's (existential statement about objects/variables/data)
- 0s and 1s are numbers, in binary, so we can convert them to numbers, in decimal.
(Note the similarities between this approach and Gödel's approach with his Incompleteness Theorem of assigning numbers to mathematical proofs/theories.)
This gives us a way of turning arbitrary objects/data into arbitrary integers.
hash('\0B') == hash'\0\0C') == 64
Ideally: hash(x) = hash(y) <==> x = y
Python details:
- hash(x) is the prehash of x
- Can use
__hash__
builtin to adjust the way the hash is computed - Challenges with adding mutable items to dictionaries - your hash function must not change over time (Python cannot protect you from that)
- You cannot use a MUTABLE LIST as a key value in a Python dictionary, because it can potentially change over time
Hashes and lists and Python
Here's an interview question that requires packing up a lot of knowledge into a single answer: "Why doesn't the following code work?"
a = [1,2,3] b = [4,5,6] d = {} d[a] = 'The first list' d[b] = 'The second list'
Why doesn't this work?
- Doesn't work because lists are mutable.
- Basics of dictionaries in Python: they are key-value data structures where the key can be any type we'd like.
- If we pass a mutable type as a key, we are going to change the input to the hash table/map, and therefore change the output location, and therefore lose our data.
- To prevent this, need to define an immutable list, or use an immutable object, or define a __hash__ function for any key types that we use that will return the same hash regardless of content.
- Example: we might pass a customer object to a database application. The customer may change (e.g., phone number, address, etc.), but this should not affect hash of object, otherwise looks like entirely new "Customer". Could luse customer ID only when computing hash function.
Solving memory hog issue
To solve the issue of hash tables being memory hogs, we have a second step that we call hashing. (Prehashing is turning keys into numbers; hashing is turning numbers into tinier sets of numbers.) Like turning a potato into hash browns. Like turning tofu into tofu crumble. Like turning large boulders into sand.
Arabic hash - grass
German hash - hashae - hatchet
Want to map a giant key space to a hash table with slots 0 to n-1. Need a hash function that will turn key 0 into position 0, key 1 into position 1, etc.
Idea:
where n is number of keys being stored in dictionary. This is the big challenge. We solve it using hash functions.
Hash function:
Collision Handling with Chaining
Hash Table Expansion
Maps ADT
See Maps/ADT for abstract data type/interface specification for Maps.
See Java API documentation for interface class Map: http://docs.oracle.com/javase/8/docs/api/java/util/Map.html
Example: word counts
A canonical example of maps is performing word counts for text files. Each word is added to the dictionary as a key, and the word count is the value. Keys are added with a default key value of 1, and if a word already exists as a key its value is incremented.
Pseudocode:
create empty map for each word in input file: if(word in map keys): increment value at key word else: add new pair (word, 1) to map
Map classification
Map types are organized as follows:
- Unsorted maps
- Hash maps
- Sorted maps
See implementations below.
Map Implementations
For implementations of maps, see the following:
Map abstract data type:
- Maps/ADT - Map interface class
- Maps/AbstractMap - AbstractMap abstract class
Map implementations using arrays:
- Maps/UnsortedArrayMap - concrete implementation of Map using unsorted ArrayList
- Maps/SortedArrayMap - concrete implementation of Map using sorted ArrayList
Map implementations using hash tables:
- Hash Maps/AbstractHashMap - abstract class defining further details of implementing a map using a hash table
- Hash Maps/ChainHashMap - concrete implementation of hash map using a chained hash map
- Hash Maps/ProbeHashMap - concrete implementation of hash map using linear probing.
Flags
Maps and Dictionaries Part of Computer Science Notes
Series on Data Structures
Maps/Dictionaries Maps · Maps/ADT · Maps in Java · Maps/OOP · Maps/Operations and Performance Map implementations: Maps/AbstractMap · Maps/UnsortedArrayMap · Maps/SortedArrayMap Dictionary implementations: Dictionaries/LinkedDict · Dictionaries/ArrayDict
Hashes Hash Maps/OOP · Hash Maps/Operations and Performance Hash Maps/Dynamic Resizing · Hash Maps/Collision Handling with Chaining Hash functions: Hash Functions · Hash Functions/Cyclic Permutation Hash map implementations: Hash Maps/AbstractHashMap · Hash Maps/ChainedHashMap
Skip Lists · Java/ConcurrentSkipList · Java implementations: SkipList
Sets Sets · Sets/ADT · Sets in Java · Sets/OOP · Multisets
|