Graphs/Connected Components

Overview

Strongly Connected Components

A directed graph is strongly connected if there is a path between all pairs of vertices.

On a directed graph, a strongly connected component (SCC) is a maximally strongly connected subgraph.

Undirected Graphs

The case of finding strongly connected components on an undirected graph is relatively easy:

Pick an arbitrary node, and start a breadth-first search (BFS) or a depth-first search (DFS). If the graph is undirected, then the BFS/DFS should visit every node. If the tour does not visit every node, then not all nodes are strongly connected.

More practically: the implementation should start by creating a list of unvisited nodes in the graph (which starts with every node on the graph), plus a list of visited nodes (which starts empty).

As nodes are visited during the DFS/BFS, they are moved from the unvisited list to the visited list. If, when the BFS/DFS has completed, the unvisited list is empty, then all nodes are members of a single strongly connected component.

If there are still nodes left in the unvisited list, those nodes compose a different connected component. To assemble this component, repeat the above procedure, but starting with a random node in the unvisited set. Then repeat to assemble the next component. Repeat this process until all components have been assembled and there are no nodes left in the unvisited list.

Directed Graphs

The case of finding strongly connected components on a directed graph is not as straightforward, as it requires checking for connections between pairs of nodes in both directions.

To find connected components, we start by applying the algorithm for an undirected graph, then reverse the direction of all edges and apply it again.

Kosaraju Algorithm

The Kosaraju Algorithm is a DFS-based algorithm for determining if all vertices are members of a single strongly connected component on the graph.

Kosaraju performs DFS two times.

First DFS

If all vertices are reachable from the DFS starting point, the DFS should only produce a single search tree. Otherwise, it will produce a forest (multiple trees). (DFS of a graph with only one strongly connected component always produces a tree.)

If there are more than one strongly connected components, the DFS may produce either a tree or a forest (it depends on the starting point).

To find and print all strongly connected components, start the DFS from a node that serves as the sink vertex, and proceed backwards to the node(s) that are source nodes. However, it is difficult to find this sequence.

To determine the sequence of the strongly connected component, we do a DFS of the graph, and store vertices according to their finish times. The finish time of a vertex that connects to other strongly connected components (other than its own strongly connected component) will be greater than the finish time of vertices in the other strongly connected component.

To use this property, we must do a DFS traversal of the complete graph, pushing every completed vertex onto a stack. In that stack, the vertices appear according to their finish times (most recent on top).

If the DFS visits all vertices, then it is possible there is a single strongly connected component. Otherwise, it is definitely not.

Second DFS

Before doing the next step (another DFS), we reverse every edge of the graph (find its transpose).

If we store the graph as an adjacency list, that means we must iterate over every edge (key-value pair) and add the reverse (value-key) pair to a new adjacency list.

Mark all graphs on transposed graph as unvisited. Perform a DFS on the transposed graph.

If the DFS on the transposed graph visits all vertices, then it is definitely a single strongly connected component. Otherwise, it is definitely not.

Flags