From charlesreid1

15 Metro

Python Preprocessing + D3 Viz for Large Datasets

Step 1: Python hooks up to large data set (census), iterates through, implements multithreading, The Cloud, MapReduce, Amazon, etc., dumps to json files in a bucket

Step 2: D3 accesses those buckets - accesses LARGE data sets - by segmenting, making buttons, dividing and conquering

Step 3: how you visualize and grok spatial and/or other data

Cluster/Stat Analysis of 10 Cities

Look at 10 metropolitan areas

  • some kind of cluster analysis
  • statistical analysis
  • data analysis
  • PCA
  • with Python: it may take a loooong time, but can parallelize it and turn it loose for a bigger project

By moving Python processing out of the loop and doing it on the front end as a pre-processing step, you can start to use different strategies/technologies (parallelization, cloud, S3 buckets, etc) for Python code.

For obtaining info abt metropolitan areas:

Alluvial Diagrams for Reaction Datasets

That would be a cool way to visualize reaction rate set... evolution of network over time



Calculus/Mathematics Concepts

using shapes nad lines to explore functions in math tables, polynomial formulas, series solutions to PDEs


Politics

Which senators represent the richest states? poorest states?

Representatives? State legislatures?

Campaign Finance

NYTimes Campaign Finance interface

Sunlight Labs

OpenStates API

Another list of more data sets for campaign finance: https://sites.google.com/site/bicoastaldatafest/data

Visualization of industries by location/Congressional district?

Click on a district, see two senators and representative, see major industries in that area, see what industries in that area contribute to whom, see what major contributors to each politician are

Log File Visualization

Take same approach as NYTimes blog post

Dump logs into Amazon S3 buckets

Analyze with Python

Plot it up

Straightforward Multivariate Visualization

Using something like this:

http://archive.ics.uci.edu/ml/datasets/Concrete+Compressive+Strength

or this:

http://archive.ics.uci.edu/ml/datasets/Energy+efficiency

and visualizing with some D3 charts.


Map + Scatterplot (DONE)

D3 chart: scatterplot of data; circles display some multivariate information (x by y, etc), and clicking on particular points highlights them on a map. In this way, the data, and not the map, drive the discovery process.

Map-to-Map Data (DONE)

Want to be able to use the state-level county map to control the county-level census map, AND control quantities contained in the census map layers.