Pandas
From charlesreid1
Contents
Installing
Installing Pandas can be thorny if you're running on a Mac, mainly because if you download and install your own version of Python, it will conflict with Mac's built-in version of Python. (I recommend leaving Mac's Python version alone.) Mac's version does NOT have pip. This means that if you use pip to install Pandas, it will install it for one version of Python, but not all versions of Python. If you don't run the right Python, Pandas will not be available.
When you install your own version of Python, make sure that it is the first python on your path, by typing:
which -a python
This will ensure that the pip on your path corresponds to the right python on your path.
First, I downloaded and installed easy_install from source.
Then blast your PYTHONPATH (keep things simple):
$ unset PYTHONPATH
Then, I ran the following commands:
$ sudo easy_install pip
$ sudo pip install numpy
$ sudo pip install numexpr
$ sudo pip install cython
$ sudo pip install tables
$ sudo pip install pandas
Or to upgrade:
$ sudo pip install --upgrade pandas
Data
Creating a Table of Arbitrary Data Types
Let's say you're trying to create a data table where you store the result of a simulation. This simulation has a set of inputs and outputs, each with a different data type. For example, the following inputs are scalars:
- Flowrate_in (float)
- Temperature_in (float)
- Pressure_in (float)
But temperature and species profiles are vectors, not scalars:
- Temperature_profile (numpy array)
- Oxygen_profile (numpy array)
Two ways of populating a Pandas data object (a DataFrame, in this case) are:
- Create arbitrary, concrete data with the type you are interested in storing
- Grab the types of the data you are interested in storing
Initializing with Data
A simple illustration of the first technique:
In[99]: reactors = [ { "flowrate_in" : 0.0, "temperature_in" : 0.0, "pressure_in" : 0.0, "temperature_profile" : zeros(100,), "oxygen_profile" : zeros(100,) } for i in arange(10) ]
This creates a list of 10 dicts containing the same initial values, which can then be used to initialize a DataFrame object:
In[100]: pandas.DataFrame(reactors) Out[100]: flowrate_in oxygen_profile pressure_in temperature_in \ 0 0 [0.0, 0.0, 0.0, 0.0, 0.0] 0 0 1 0 [0.0, 0.0, 0.0, 0.0, 0.0] 0 0 2 0 [0.0, 0.0, 0.0, 0.0, 0.0] 0 0 3 0 [0.0, 0.0, 0.0, 0.0, 0.0] 0 0 4 0 [0.0, 0.0, 0.0, 0.0, 0.0] 0 0 5 0 [0.0, 0.0, 0.0, 0.0, 0.0] 0 0 6 0 [0.0, 0.0, 0.0, 0.0, 0.0] 0 0 7 0 [0.0, 0.0, 0.0, 0.0, 0.0] 0 0 8 0 [0.0, 0.0, 0.0, 0.0, 0.0] 0 0 9 0 [0.0, 0.0, 0.0, 0.0, 0.0] 0 0 temperature_profile 0 [0.0, 0.0, 0.0, 0.0, 0.0] 1 [0.0, 0.0, 0.0, 0.0, 0.0] 2 [0.0, 0.0, 0.0, 0.0, 0.0] 3 [0.0, 0.0, 0.0, 0.0, 0.0] 4 [0.0, 0.0, 0.0, 0.0, 0.0] 5 [0.0, 0.0, 0.0, 0.0, 0.0] 6 [0.0, 0.0, 0.0, 0.0, 0.0] 7 [0.0, 0.0, 0.0, 0.0, 0.0] 8 [0.0, 0.0, 0.0, 0.0, 0.0] 9 [0.0, 0.0, 0.0, 0.0, 0.0]
Initializing with Types
A simple illustration of the second technique:
In[101]: df = reactors = [ { "flowrate_in" : numpy.float32, "temperature_in" : numpy.float32, "pressure_in" : numpy.float32, "temperature_profile" : numpy.ndarray, "oxygen_profile" : numpy.ndarray } for i in range(10) ]
This creates a list of 10 dicts that are all empty:
In[102]: df = pandas.DataFrame(reactors) Out[102]: flowrate_in oxygen_profile pressure_in \ 0 <type 'numpy.float32'> <type 'numpy.ndarray'> <type 'numpy.float32'> 1 <type 'numpy.float32'> <type 'numpy.ndarray'> <type 'numpy.float32'> 2 <type 'numpy.float32'> <type 'numpy.ndarray'> <type 'numpy.float32'> 3 <type 'numpy.float32'> <type 'numpy.ndarray'> <type 'numpy.float32'> 4 <type 'numpy.float32'> <type 'numpy.ndarray'> <type 'numpy.float32'> 5 <type 'numpy.float32'> <type 'numpy.ndarray'> <type 'numpy.float32'> 6 <type 'numpy.float32'> <type 'numpy.ndarray'> <type 'numpy.float32'> 7 <type 'numpy.float32'> <type 'numpy.ndarray'> <type 'numpy.float32'> 8 <type 'numpy.float32'> <type 'numpy.ndarray'> <type 'numpy.float32'> 9 <type 'numpy.float32'> <type 'numpy.ndarray'> <type 'numpy.float32'> temperature_in temperature_profile 0 <type 'numpy.float32'> <type 'numpy.ndarray'> 1 <type 'numpy.float32'> <type 'numpy.ndarray'> 2 <type 'numpy.float32'> <type 'numpy.ndarray'> 3 <type 'numpy.float32'> <type 'numpy.ndarray'> 4 <type 'numpy.float32'> <type 'numpy.ndarray'> 5 <type 'numpy.float32'> <type 'numpy.ndarray'> 6 <type 'numpy.float32'> <type 'numpy.ndarray'> 7 <type 'numpy.float32'> <type 'numpy.ndarray'> 8 <type 'numpy.float32'> <type 'numpy.ndarray'> 9 <type 'numpy.float32'> <type 'numpy.ndarray'>
Modifying a Table with Data
When you treat data as a 2D array of arbitrary data types, each of those numpy.ndarray objects can be whatever size it wants - all that Pandas cares about is the fact that it is a numpy array. Beyond that, Pandas doesn't care about the shape or size of the array.
This means that, in practice, you could have temperature or oxygen profiles of entirely different sizes:
In [117]: df Out[117]: flowrate_in oxygen_profile pressure_in temperature_in \ 0 0 [0.0, 0.0, 0.0, 0.0, 0.0] 0 0 1 0 [0.0, 0.0, 0.0, 0.0, 0.0] 0 0 2 0 [0.0, 0.0, 0.0, 0.0, 0.0] 0 0 3 0 [0.0, 0.0, 0.0, 0.0, 0.0] 0 0 4 0 [0.0, 0.0, 0.0, 0.0, 0.0] 0 0 5 0 [0.0, 0.0, 0.0, 0.0, 0.0] 0 0 6 0 [0.0, 0.0, 0.0, 0.0, 0.0] 0 0 7 0 [0.0, 0.0, 0.0, 0.0, 0.0] 0 0 8 0 [0.0, 0.0, 0.0, 0.0, 0.0] 0 0 9 0 [0.0, 0.0, 0.0, 0.0, 0.0] 0 0 temperature_profile 0 [0.0, 0.0, 0.0, 0.0, 0.0] 1 [0.0, 0.0, 0.0, 0.0, 0.0] 2 [0.0, 0.0, 0.0, 0.0, 0.0] 3 [0.0, 0.0, 0.0, 0.0, 0.0] 4 [0.0, 0.0, 0.0, 0.0, 0.0] 5 [0.0, 0.0, 0.0, 0.0, 0.0] 6 [0.0, 0.0, 0.0, 0.0, 0.0] 7 [0.0, 0.0, 0.0, 0.0, 0.0] 8 [0.0, 0.0, 0.0, 0.0, 0.0] 9 [0.0, 0.0, 0.0, 0.0, 0.0]
Now set the temperature profiles to be profiles of different lengths:
In [122]: df['temperature_profile'][0] = 25*ones(3,) In [123]: df['temperature_profile'][1] = 28*ones(5,) In [124]: df['temperature_profile'][2] = 30*ones(8,) In [125]: df Out[125]: flowrate_in oxygen_profile pressure_in temperature_in \ 0 0 [0.0, 0.0, 0.0, 0.0, 0.0] 0 0 1 0 [0.0, 0.0, 0.0, 0.0, 0.0] 0 0 2 0 [0.0, 0.0, 0.0, 0.0, 0.0] 0 0 3 0 [0.0, 0.0, 0.0, 0.0, 0.0] 0 0 4 0 [0.0, 0.0, 0.0, 0.0, 0.0] 0 0 5 0 [0.0, 0.0, 0.0, 0.0, 0.0] 0 0 6 0 [0.0, 0.0, 0.0, 0.0, 0.0] 0 0 7 0 [0.0, 0.0, 0.0, 0.0, 0.0] 0 0 8 0 [0.0, 0.0, 0.0, 0.0, 0.0] 0 0 9 0 [0.0, 0.0, 0.0, 0.0, 0.0] 0 0 temperature_profile 0 [25.0, 25.0, 25.0] 1 [28.0, 28.0, 28.0, 28.0, 28.0] 2 [30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0] 3 [0.0, 0.0, 0.0, 0.0, 0.0] 4 [0.0, 0.0, 0.0, 0.0, 0.0] 5 [0.0, 0.0, 0.0, 0.0, 0.0] 6 [0.0, 0.0, 0.0, 0.0, 0.0] 7 [0.0, 0.0, 0.0, 0.0, 0.0] 8 [0.0, 0.0, 0.0, 0.0, 0.0] 9 [0.0, 0.0, 0.0, 0.0, 0.0]
Saving a Table with Data
H5
To save a DataFrame using HDF5:
df.to_hdf('dummy.h5','name_of_array',append=False) df_h5 = pandas.read_hdf('dummy.h5', 'name_of_array')
CSV
df.to_csv('dummy.csv') df_csv = pandas.read_csv('dummy.csv')
Linear Algebra Topics in linear algebra.
Matlab · Octave · Sundials · Trilinos
|
Scientific Computing Topics in scientific computing.
Numerical Software: Lapack · Sundials · Matlab · Octave · FFTW Petsc · Example Petsc Makefile · Trilinos · Hypre · Ginac · Gnuplot
Python: Numpy · Scipy · Pandas · Matplotlib · Python Sundials · Py4Sci Scikit-learn: Sklearn · Skimage
|
Python a powerful programming language
Scientific Python: Data analysis libraries: Scipy · Numpy · Pandas · Statsmodel Machine learning libraries: Sklearn Neural network libraries: Tensorflow · Keras Plotting/viz: Matplotlib · Seaborn · Jupyter Solving partial differential equations and bessel functions: Fipy · Bessel Functions
Web and Networking Python: Web programming: Flask · Webapps · Mechanize · Scrapy · Gunicorn Wifi: Wireless/Python · Scapy IPython and Jupyter: Jupyter
Drawing, Geometry, and Shapes: Shapely (for drawing shapes): Shapely Geography library: Geos
General Useful Python Utilities: Python Remote Objects: Pyro Logging (create multi-channel log messages): Logging Keyboard (control keyboard from Python): Keyboard
Black Hat Python: Network scanning: Python/Scanner
|