Experiment1
From charlesreid1
Walk through the steps for experiment 1.
Overview
The Data
Start with the wifi data. This data will be collected in various locations by Raspberry Pis placed and operating over a given time interval.
Raspberry Pi Data
On board the Pis, airodump-ng will be used to create a CSV file containing information gathered about the wifi network over a time interval of 1-5 minutes. Each CSV file created by airodump-ng contains an aggregated view of the wifi network over that time interval, so you end up with a large number of CSV files - one observation for every 1-5 minutes.
Data Processing
These CSV files are then parsed and processed using Python, to turn those CSV files and observational data about wifi networks into data in an SQL database.
SQL Data Warehouse
The data will be stored in a data warehouse, in the form of a SQLite database. This database will provide a place for us to store two kinds of data:
1. raw data - the basic rearrangement of information to get it from the CSV files into an SQL database.
2. processed data (derived quantities) - calculated quantities that will involve maths, calculations, and mathematical representations.
Mathematical Representation
An example of processed data that we might store in a database is a mathematical representation, such as a graph.
Graphs contain a lot of information and are extremely useful for analysis. However, constructing graphs, and computing information about graphs, can be computationally expensive. And if it takes a while for one graph, it will take a while for hundreds of graphs, and we will have hundreds of graphs from our data.
For this reason, we want to store processed data in the database as well.
Part 1
Collecting Data
Goal: to use the Pythons to collect one wifi csv every N minutes.
SpawnKillDump
Behold the script: spawnkilldump.py
Processing Data
Process CSV
A prototype Python script that shows how to load the CSV and parse the data.
with open('awesome-01.csv') as f:
lines = f.readlines()
# Find where breaks in CSV file are located
# (Split between APs and Clients)
breaks = []
for i in range(len(lines)):
tokens = [t.strip() for t in lines[i].split(",")]
if len(tokens)==1:
breaks.append(i)
# Use that to extract ap and client data
ap_header = lines[breaks[0]+1]
ap_data = lines[breaks[0]+2:breaks[1]-1]
client_header = lines[breaks[1]+1]
client_data = lines[breaks[1]+2:breaks[2]-1]
# Tokenize and extract fields
print "AP MACs:"
for ap in ap_data:
tokens = ap.split(",")
print tokens[0]