Walk through the steps for experiment 1.

Overview

The Data

Start with the wifi data. This data will be collected in various locations by Raspberry Pis placed and operating over a given time interval.

Raspberry Pi Data

On board the Pis, airodump-ng will be used to create a CSV file containing information gathered about the wifi network over a time interval of 1-5 minutes. Each CSV file created by airodump-ng contains an aggregated view of the wifi network over that time interval, so you end up with a large number of CSV files - one observation for every 1-5 minutes.

Data Processing

These CSV files are then parsed and processed using Python, to turn those CSV files and observational data about wifi networks into data in an SQL database.

SQL Data Warehouse

The data will be stored in a data warehouse, in the form of a SQLite database. This database will provide a place for us to store two kinds of data:

1. raw data - the basic rearrangement of information to get it from the CSV files into an SQL database.

2. processed data (derived quantities) - calculated quantities that will involve maths, calculations, and mathematical representations.

Mathematical Representation

An example of processed data that we might store in a database is a mathematical representation, such as a graph.

Graphs contain a lot of information and are extremely useful for analysis. However, constructing graphs, and computing information about graphs, can be computationally expensive. And if it takes a while for one graph, it will take a while for hundreds of graphs, and we will have hundreds of graphs from our data.

For this reason, we want to store processed data in the database as well.

Part 1

Collecting Data

Goal: to use the Pythons to collect one wifi csv every N minutes.

SpawnKillDump

Behold the script: spawnkilldump.py

Processing Data

Process CSV

A prototype Python script that shows how to load the CSV and parse the data.

with open('awesome-01.csv') as f:
    lines = f.readlines()



# Find where breaks in CSV file are located
# (Split between APs and Clients)
breaks = []
for i in range(len(lines)):
    tokens = [t.strip() for t in lines[i].split(",")]
    if len(tokens)==1:
        breaks.append(i)



# Use that to extract ap and client data

ap_header    = lines[breaks[0]+1]
ap_data      = lines[breaks[0]+2:breaks[1]-1]

client_header = lines[breaks[1]+1]
client_data   = lines[breaks[1]+2:breaks[2]-1]



# Tokenize and extract fields

print "AP MACs:"
for ap in ap_data:
    tokens = ap.split(",")
    print tokens[0]

Experiment1

From charlesreid1

Contents