From charlesreid1

Conversations

Components

To analyze a wireless conversation, you need to be able to parse a few different pieces of information.

First is the source address. This will be a MAC address - you will not get an IP address unless you're on the same network and there is some kind of name resolution service available to turn a MAC address (Layer 2) into an IP address (Layer 3).

Show the Packet

Here is a dead-simple three-line script to show the full contents of the 120th packet:

from scapy.all import *

plist = rdpcap("airportSniffNERR6R.cap")

plist[120].show()

Getting Source/Destination Address

A simple script to pull out the source and destination of each packet using scapy is given below:

from scapy.all import *

plist = rdpcap("airportSniffNERR6R.cap")

getsrcdst = lambda x:(x.addr1, x.addr2, x.addr3)

for p in plist:
    try:
        c = getsrcdst(p)
        print c
    except AttributeError:
        pass

This script reads a relatively small pcap file and prints out the addr1, addr2, and addr3 fields for each packet. This can be used to build a list of MAC addresses.

Further parsing could be done to identify packets that are beacons from access points, to determine which MAC addresses are access points.

Conversation Analysis

Also see Statistical Analysis of Networks

In any conversation, there are two endpoints, A and B. Sometimes A is the source and B is the destination - A is sending data to B. And sometimes B is the source and A is the destination - B is sending data to A.

The relationship can be described with a network. A network is composed of dots, nor nodes, and lines, or edges. In our case, we are representing a conversation with nodes (entities like A and B) and edges (representing a relationship between entities). A conversation can be thought of as two nodes and two edges - one edge representing A to B, the other edge representing B to A.

Using the network representation, we can also think about it as two separate flow networks (see https://en.wikipedia.org/wiki/Flow_network): the first flow network is a series of nodes connected by edges representing data from the outside world (via routers or access points) to nodes, and the second flow network is nodes connected by edges representing data flowing outward.

To simplify starting out, we can ignore a particular dimension by simply integrating over it entirely. For example, to remove the temporal aspect of conversations - how the conversations evolve over time - we can loop over every packet and collect information about conversations and flows.

If we wanted to get temporal resolution, however, we could loop through each packet and create a time vector of conversations, with some averaging window like 30 seconds or 5 minutes.

Other applications:

Also, neat:

Flags