Revision as of 06:17, 5 June 2017

Notes

Dictionary data type - definition

Dictionaries are data structures that store data in such a way that it can be looked up by value, rather than by index/position/reference.

These types of data sets naturally lend themselves to a structure that partitions data based on its sort value, as in a binary tree. This allows operations to happen faster, because the lookup key itself guides the process of finding the corresponding data in the data structure.

Skiena Chapter 3 notes

This entire chapter is abound with practical advice about data structures. Principally - use black boxes; focus on big-O analysis; the data structure can always be made more efficient. And use binary trees and hash tables to store your variables by value! These are relentlessly practical and useful data structures because lookups are O(1).

Dictionaries definitions

Unlike stacks and queues, which allow access to items in the data structure independent of their value, a dictionary provides access to an object using its value as a way of obtaining a reference to it in the data structure.

Dictionaries interface

Dictionaries implement the following operations:

search(d,k) - given a search key k, preeturn a pointer to the element in d corresponding to key value k (if one exists)
insert(d,x) - given a data item, add it to the set of data
delete(d,x) - given a pointer to a given data item x in the dictionary d, remove it from d

additional useful operations:

max(d) / min(d) - retrieve item with largest or smallest key - enabling dictionary to serve as priority queue
predecessor(d,k) / successor(d,k) - retrieve the item from d whose key is before or after k in sorted order (enables iteration)

Example: to remove all duplicate names from a mailing list, and print the results in sorted order, initialize an empty dictionary, iterate the through the list, and add each item as a key, unless it already exists in the dictionary. (Note that there are many operations happening here, so the more of them we can make O(1), the better.)

Once we are finished, extract the remaining names out of the dictionary to print them in sorted order. Get the first item via min(d) and repeatedly call successor until obtain max(d).

Dictionary implementations based on arrays

See Dictionaries/ArrayDict

Dictionary implementations based on linked lists

See Dictionaries/LinkedDict

Questions

Skiena Chapter 3 questions

Question 3-4

Question 3-4) Design a dictionary structure in which search, insertion, and deletion can be handled in O(1) time.

Assume set elements are from 1..n)

(Initialization can take O(n) time.)

Well, if this search has to take O(1) time, we need to know EXACTLY where things are, given their value.
The problem seems to indicate either a hash table, or an off-by-one array.
Off-by-one array: the numbers passed in from 1 to n are represented in an array of length n, with the kth number represented by the k-1th cell.
hash table: key value is given, turns into unique integer, looked up in O(1) time

Question 3-6

Question 3-6) Describe how to modify any balanced tree structure such that search, insert, delete, min, max each take the expected amount of time, O(log n), but sucessor and predecessor methods now take O(1) time each. What modifications are required, and to which methods?

The methods:

Predecessor(D,k) - returns the predecessor key (in order) to the given key k
Successor(D,k) - returns the successor key to the given key k

Question is, how to we implement neighbor lookup in O(1) time in a balanced tree?

Two options:

Option one - add a previous and next node pointers. This increases the complexity of the bookkeeping, and of the add/insert/remove methods, which now have to traverse the tree to fix their links.
Option two - make a data tree that is twice as big, and only holds data in the bottom-most leaf nodes. This obviates the need for a next and previous pointer, because the tree is easy to navigate - up and over... at most log(n) operations to traverse tree.

Option one:

We would need to modify the add() and insert() method, the delete method, no need to modify search or min or max methods.

We modify them to keep track of the previous node. Finding the previous node would be somewhat tricky logic. When we add a new node, find its next and previous nodes, and link them correctly.

Option two:

It would be algorithmically easier (but more expensive and more complicated implementation wise) to use a tree structure with only data in the leaf nodes. Finding/keeping track of next or previous node then doesn't require the extra two pointers and the extra logic of traversing a binary tree, it just requires twice the number of nodes (tree of size N has N/2 leaf nodes).

This also requires implementation of the entire data structure again, and it is not so obvious how you keep a tree structure like that balanced and dynamically resized.

Flags

Dictionaries: Difference between revisions

From charlesreid1