From charlesreid1

Notes

Notes from Goodrich et al Data Structures in Python Chapter 5

Basic usage of lists, strings, and tuples seems straightforward, but there are several important subtleties regarding behaviors associated with these classes

Basics

encapsulation - noting when the user of the class does not need to know the internal details of the implementation

byte - a typical unit of storage, equivalent to 8 bits

memory address - how the computer keeps track of what information is stored in what byte

Individual bytes of memory can be stored or retrieved in O(1) time

Python characters are represented using the Unicode character set. Python internally represents each Unicode character with 16 bits (2 bytes). Thus, a six character string would be stored in 12 consecutive bytes in memory.

Can use the sys.getsizeof() function from the sys module to get the size in bytes of a particular variable.

See Arrays/Python/Sizeof for details and script examples.

Referential lists

Python lists are referential - they contain pointers to addresses in memory

The number of bits used to store the memory address of each element of a list is fixed - 64 bits per address. The None object can be used as an element of the list to represent an empty/missing item, it still takes up 64 bits of address space.

Referential nature is important to understanding semantics:

Integer lists are going to refer to an immutable object in memory, representing that particular integer value. So, if we have two or three lists of integers, and they share numbers, those are actually addresses in memory, that point to immutable values of integers.

When we run a command like counters[2] += 1 we technically do not change the value of the existing integer, we end up pointing to space in memory for the new, incremented integer value.

If we want to create a new list as a copy, we can do a shallow copy or a deep copy.

  • Shallow copy: backup = list(primes) - this produces a new list that references the same addresses in memory as the original list
  • Deep copy: from copy import deep_copy; backup = deep_copy(primes) - this produces a new version that is duplicated element-by-element

if we want to add all the elements from one list onto the end of another list, we use the extend() command, as in a.extend(b)

Compact arrays

By default, most lists are referential. However, string arrays are not referential - they store the 16-bit Unicode character (2 bytes apiece) instead of a 64-bit address (8 bytes), making it cheaper. It also uses contiguous memory.

A referential integer array, on the other hand, can potentially use up a whole lot of space. For example, each time we add a new item to an integer list, it has to set aside space in memory for the value of that new integer (if it has not already), and then it has to set aside space for the 64-bit (or 8-byte) address where that integer lives.

To illustrate - if we want to store a sequence of 1M 64-bit integers, it won't take 64 x 1M = 64 M bits. It will actually take 4-5 times that amount of space.

Each new element creates a new 64-bit address. But that address has to point to an integer value, which is stored somewhere else in memory.

We can use the getsizeof method to get the actual size of things in memory (see Arrays/Python/Sizeof).

Compact structure forces the integers to be stored directly in memory, and not as references.

The advantage to compact structures is high performance. They live closer in memory.

To use, import the module named array. The array constructor requires a type code to be specified for the first argument.

primes = array('i', [2,3,5,7,11,13,17,19])

See Arrays/Python/CompactArrays

Lists are referential structures, so the result of getsizeof for a list instance is only including the size of representing the PRIMARY STRUCTURE - that is, it does not count the memory used by the objects in the array.

DynamicArray class

https://charlesreid1.com:3000/cs/python/src/master/arrays/DynamicArray.py

Book runs through the implementation of a dynamic array class. Few notes:

  • Three private fields: capacity of array, number of elements, and the array itself (which is a low-level ctype object)
  • Implementing private method to manually create a second array with double the capacity, and copy each item in one at a time
  • Implementing __getitem__ to allow indexing
  • __len__ returns length

Hallmark expansion procedure - implemented with non-public _resize_ method

Python list class

If we look at the sizes of the lists as we append additional elements, we see it is not a pure geometric progression, nor is it an arithmetic progression.

The append method exhibits amortized constant-time behavior. While the resize operation is expensive, it is done infrequently enough that its cost is amortized over a large number of operations.

There seems clear evidence the amortized time for each append is independent of n. See Arrays/Python/AppendCost

Questions

Questions from Goodrich et al Data Structures in Python Chapter 5

Getting the size (in bytes) of a list

R-5.1, R-5.2, R-5.3: Arrays/Python/Sizeof

Building a custom array data structure

R-5.4: Arrays/Python/DynamicArray

Implementing the CaesarCipher with string ops in Python

R-5.10: Arrays/Python/CaesarCipher

Proving Whether Dynamic Shrinkage/Expansion is O(1)

Question C-5.19 and C-5.20 both deal with the implementation of a dynamic shrinkage, by which each time an item is popped the capacity of the array is checked.

Q-5.19: Consider the pop method implemented for the DynamicArray class, in which an array of capacity C is resized to a capacity of n (number of elements) anytime the number of elements goes below C/4. Give a formal proof that any sequence of append/pop operations on an empty, dynamic array takes O(n) time.

C-5.20: Consider variant of C-5.19, where capacity C is resized to a capcity of n (number of elements) anytime the number of elements goes below C/2. Show that there are sequences of n operations such that the cost of these operations would be Omega(n^2) to execute.

Arrays/Python/DynamicArrayResizing

Searching for duplicates