Arrays/Python
From charlesreid1
For empirical demonstration of table doubling with Python, see Arrays/Python/Sizeof
Notes
Notes from Goodrich et al Data Structures in Python Chapter 5
Basic usage of lists, strings, and tuples seems straightforward, but there are several important subtleties regarding behaviors associated with these classes
Basics
encapsulation - noting when the user of the class does not need to know the internal details of the implementation
byte - a typical unit of storage, equivalent to 8 bits
memory address - how the computer keeps track of what information is stored in what byte
Individual bytes of memory can be stored or retrieved in O(1) time
Python characters are represented using the Unicode character set. Python internally represents each Unicode character with 16 bits (2 bytes). Thus, a six character string would be stored in 12 consecutive bytes in memory.
Can use the sys.getsizeof()
function from the sys module to get the size in bytes of a particular variable.
See Arrays/Python/Sizeof for details and script examples.
Referential lists
Python lists are referential - they contain pointers to addresses in memory
The number of bits used to store the memory address of each element of a list is fixed - 64 bits per address. The None object can be used as an element of the list to represent an empty/missing item, it still takes up 64 bits of address space.
Referential nature is important to understanding semantics:
Integer lists are going to refer to an immutable object in memory, representing that particular integer value. So, if we have two or three lists of integers, and they share numbers, those are actually addresses in memory, that point to immutable values of integers.
When we run a command like counters[2] += 1
we technically do not change the value of the existing integer, we end up pointing to space in memory for the new, incremented integer value.
If we want to create a new list as a copy, we can do a shallow copy or a deep copy.
- Shallow copy:
backup = list(primes)
- this produces a new list that references the same addresses in memory as the original list - Deep copy:
from copy import deep_copy; backup = deep_copy(primes)
- this produces a new version that is duplicated element-by-element
if we want to add all the elements from one list onto the end of another list, we use the extend()
command, as in a.extend(b)
Compact arrays
By default, most lists are referential. However, string arrays are not referential - they store the 16-bit Unicode character (2 bytes apiece) instead of a 64-bit address (8 bytes), making it cheaper. It also uses contiguous memory.
A referential integer array, on the other hand, can potentially use up a whole lot of space. For example, each time we add a new item to an integer list, it has to set aside space in memory for the value of that new integer (if it has not already), and then it has to set aside space for the 64-bit (or 8-byte) address where that integer lives.
To illustrate - if we want to store a sequence of 1M 64-bit integers, it won't take 64 x 1M = 64 M bits. It will actually take 4-5 times that amount of space.
Each new element creates a new 64-bit address. But that address has to point to an integer value, which is stored somewhere else in memory.
We can use the getsizeof method to get the actual size of things in memory (see Arrays/Python/Sizeof).
Compact structure forces the integers to be stored directly in memory, and not as references.
The advantage to compact structures is high performance. They live closer in memory.
To use, import the module named array. The array constructor requires a type code to be specified for the first argument.
primes = array('i', [2,3,5,7,11,13,17,19])
See Arrays/Python/CompactArrays
Lists are referential structures, so the result of getsizeof for a list instance is only including the size of representing the PRIMARY STRUCTURE - that is, it does not count the memory used by the objects in the array.
DynamicArray class
https://git.charlesreid1.com/cs/python/src/master/arrays/DynamicArray.py
Book runs through the implementation of a dynamic array class. Few notes:
- Three private fields: capacity of array, number of elements, and the array itself (which is a low-level ctype object)
- Implementing private method to manually create a second array with double the capacity, and copy each item in one at a time
- Implementing
__getitem__
to allow indexing __len__
returns length
Hallmark expansion procedure - implemented with non-public _resize_ method
Python list class
If we look at the sizes of the lists as we append additional elements, we see it is not a pure geometric progression, nor is it an arithmetic progression.
The append method exhibits amortized constant-time behavior. While the resize operation is expensive, it is done infrequently enough that its cost is amortized over a large number of operations.
There seems clear evidence the amortized time for each append is independent of n. See Arrays/Python/AppendCost
Questions
Questions from Goodrich et al Data Structures in Python Chapter 5
Getting the size (in bytes) of a list
R-5.1, R-5.2, R-5.3: Arrays/Python/Sizeof
Building a custom array data structure
R-5.4: Arrays/Python/DynamicArray
Implementing the CaesarCipher with string ops in Python
R-5.10: Arrays/Python/CaesarCipher
Proving Whether Dynamic Shrinkage/Expansion is O(1)
Question C-5.19 and C-5.20 both deal with the implementation of a dynamic shrinkage, by which each time an item is popped the capacity of the array is checked.
Q-5.19: Consider the pop method implemented for the DynamicArray class, in which an array of capacity C is resized to a capacity of n (number of elements) anytime the number of elements goes below C/4. Give a formal proof that any sequence of append/pop operations on an empty, dynamic array takes O(n) time.
C-5.20: Consider variant of C-5.19, where capacity C is resized to a capcity of n (number of elements) anytime the number of elements goes below C/2. Show that there are sequences of n operations such that the cost of these operations would be Omega(n^2) to execute.
Arrays/Python/DynamicArrayResizing
Searching for duplicates
Arrays Part of Computer Science Notes
Series on Data Structures Python: Arrays/Python · Arrays/Python/Sizeof · Arrays/Python/AppendCost · Arrays/Python/CaesarCipher · Arrays/Python/CompactArrays · Arrays/Python/DynamicArray Java: Arrays/Java · Arrays/Java/CaesarCipher · Arrays/Java/FisherYates · Arrays/Java/PythonList · Arrays/Java/Repeatedly_Remove Categories: Category:Python Arrays
|