Graphs/Depth First Traversal
From charlesreid1
This page covers depth-first traversal and depth-first search on a graph.
Also see DFS
Notes
What DFS Gets Us
A depth first traversal and search algorithm can be used as an algorithmic building block for a large number of problems.
DFS gives us:
- paths
- connectivity
- cycles
- spanning trees
Specifically, we can get different things on directed vs. undirected graphs.
On an undirected graph, DFS can be used to:
- Find the path between two given vertices, or report that none exists
- Test whether a graph is connected
- Compute the spanning tree of a graph (which can, in turn, be used to bootstrap a minimum spanning tree): see Graphs/Minimum Spanning Tree
- Find the connected components of a graph
- Determine whether a graph has any cycles
On an undirected (connected) graph, to construct a spanning tree, do a DFS!
On a directed graph, DFS can be used to:
- Find a directed path between two given vertices, or report that none exists
- Determine the set of reachable vertices from a source vertex
- Test whether a graph is strongly connected
- Determine whether a graph has any cycles
- Compute the transitive closure of a graph: Graphs/Transitive Closure
On a directed graph, to discover vertices reachable from a source vertex, do a DFS! To discover cycles, do a DFS and look for back-edges!
Note that a depth-first search requires that we be able to mark vertices and/or edges are explored, and be able to test if they have been explored in O(1) time. This is easiest to do by augmenting the vertex class, but can also be done by creating a map of vertices to booleans or vertices to edges.
DFS Tree
Because of the fact that the depth-first search must proceed in an ordered manner, and cannot re-visit vertices that have already been vertices, we can think of the depth-first search as forming a tree.
A tree is a type of acyclic graph in which each vertex (node) has one ancestor and zero or more children. As we traverse the graph with a depth-first search, we are likewise forming a path in which each vertex we visit has one ancestor (the vertex preceding it, which has now been marked as visited) and zero or more children (zero if we reach a vertex that has only one outgoing edge, or if all of the vertex's edges lead to already-visited edges).
Like a tree, we can wind up with branches when we reach a "dead end" (a leaf node of the DFS tree) and have to backtrack to prior vertices (ancestor nodes in the DFS tree).
See #Edge Classification section below for full explanation, but here is an example of a DFS tree:
Pseudocode
Pseudocode for a recursive depth-first search follows.
Construct DFS Tree
This function constructs the DFS tree by assembling a collection of pairs of vertices v and the discovery edges e that led to the vertex. This collection can be used to reconstruct paths (see next section). This DFS function takes two arguments: a graph G, and a vertex u from which to begin the depth-first search. The DFS function returns a collection of vertices that are reachable from u.
# Does a depth first search on graph G starting from vertex u # Results in a collection of vertices reachable from vertex u, # in the form of (v,e) pairs (vertex v reachable from vertex u via edge e) def DFS(G, u): mark vertex u as visited for each outgoing edge e = (u,v) of u: if vertex v not visited: mark vertex v as visited via edge e add (v,e) pair to container call DFS(G,v)
There are (at least) two methods for tracking which vertices have been visited:
- Maintain a hash map containing the vertex v as the key and the edge e as the value, allowing an O(1) determination of the vertex has been visited
- Add a boolean flag to the Vertex object, and an accessors methods visit() and unvisit()
Reconstruct Path U to V in DFS Tree
To reconstruct the path using the collection of vertex-edge pairs, we can write a reconstruct_path method that takes two vertices, and the collection of vertex-edge pairs:
# Reconstructs path from u to v # using the list of vertex-edge pairs # in the DFS tree. def reconstruct_path(u, v, vertex_edge_pairs): initialize reconstructed path if v not in vertex_edge_pairs: # Empty case return empty reconstructed path add v to reconstructed_path walk = v while walk is not u: e = edge that leads to walk parent = e.opposite_vertex(walk) add parent to path walk = parent # Path is currently from v to u # Reverse path before returning return reversed(path)
Java Code
See Graphs/Java/Adjacency Map Lite#Algorithms for an implementation of a depth-first search.
The code for the DFS is given below, but refer to Graphs/Java/Adjacency Map Lite#Algorithms for implementation details (particularly the Graph, Vertex, and Edge classes). These are straightforward enough to understand here though.
/** Recursive depth-first search that populates HashMap discovered * with all edges/vertices discoverable from u.*/ public static void dfs(Graph g, Vertex u, HashSet<Vertex> known, HashMap<Vertex, Edge> discovered) { known.add(u); for(Edge e : u.outgoingEdges()) { Vertex v = e.getOpposite(u); if(!known.contains(v)) { // e = edge that discovered v discovered.put(v,e); dfs(g, v, known, discovered); } } }
This results in a map, discovered, that maps vertices to the edge by which they were discovered. Each vertex on a particular connected component on a graph will be contained in discovered, except the original vertex (u) that was passed with the original dfs() function call. That vertex is the root vertex of the component, and is left out of the discovered map.
If there are multiple components, DFS can be called on each component. The number of vertices on the graph that are missing from the discovered map gives the number of connected components on the graph. Again, refer to Graphs/Java/Adjacency Map Lite#Algorithms for details.
Edge Classification
When performing a depth-first search, we are interested in visiting each vertex only once. Thus, when we are considering a given vertex, we only want to consider edges that connect it to other unvisited vertices. This means we need a way of marking vertices as visited.
Thus, it is important to classify the edges that connect to a given vertex as belonging to one of three classes:
- Back edge - connects a vertex to an ancestor vertex in the DFS tree
- Forward edge - connects a vertex to its descendant in the DFS tree
- Cross edge - connects a vertex to a vertex that is not an ancestor or a descendant in the DFS tree
Example: Undirected Graph
Taking the following graph of letters as an example, the DFS tree can be constructed by starting at vertex A and walking alphabetically through the vertices in alphabetical order. This results in the DFS search tree, shown in green in the second image. Other edges are classified as back edges or cross edges, and are colored accordingly.
Example: Directed Graph
On the directed graph, the depth-first search tree is constructed by starting at the MA vertex and walking through the graph (in this case, the order in which vertices are selected is entirely arbitrary). From this procedure, the depth-first search tree is constructed and is shown in green. The back edges, forward edges, and cross edges are also colored accordingly.
Re-drawing the topography of the graph, but keeping all of the directed edges and their coloring, reveals the structure in a much more intuitive way. The following is the exact same graph, but oriented more explicitly around the structure of the DFS tree:
Note that the forward and cross edges are both labeled pink; the only cross edge, made more obvious from the representation of the DFS tree above, is the WA-CA edge.
Detecting Cycles
The DFS tree can be used to detect cycles. To find cycles, we simply look for back edges. If a back edge exists, a cycle exists.
To obtain the cycle, traverse the DFS tree until a vertex with a back edge is reached. Take the back edge from the descendant to its ancestor, and follow the tree edges back to the descendant.
Properties of DFS
Some important properties of DFS are mentioned here:
On undirected graph G, a DFS tree starting at vertex s visits all vertices on the connected component of s. The discovery edges (the edges in the DFS tree) form a spanning tree over the connected component of s.
On a directed graph, a DFS tree starting at vertex s visits all vertices that are reachable from s. The DFS tree contains directed paths from s to every vertex reachable from s.
DFS is called at most once on each vertex - once DFS has been called on a vertex, it is marked as visited and is not visited a second time. On an undirected graph, each edge is examined at most twice (once from each end of its vertex). On a directed graph, each edge is examined at most once (from the origin vertex).
Big O Cost
If we use a data structure for the graph that implements the following operations with the specified big-O cost:
- Creating and iterating through incident edges of all vertices takes time (that is, O(degree of v))
- Getting the vertex on the opposite end of a given edge takes time
- Marking a vertex or edge as visited takes time, and testing if a vertex or edge has been visited takes time
then we can implement a DFS starting at node s with the following big-O cost:
where is number of vertices reachable from s, and is number of edges incident to those vertices reachable from s.
DFS vs BFS
A depth first search tells us about connectivity and reachability - DFS tells us if a vertex v can be reached from a vertex u. This is a simple yes/no answer to the question of how to get from u to v.
A breadth first search tells us about the shortest routes (in terms of number of edges). That is, it examines connectivity in a hierarchical way that yields the shortest path from a vertex u to a vertex v.
Related
Graphs:
- Graphs#Graph Traversals
- Graphs/Depth First Traversal
- Graphs/Breadth First Traversal
- Graphs/Euler Tour
Traversals on trees:
Breadth-first search and traversal on trees:
Depth-first search and traversal on trees:
OOP design patterns:
Flags