Search⌘ K
AI Features

Implementation of DFS

Explore how to implement depth-first search with C++ to detect cycles in directed and undirected graphs. Understand node states, recursive DFS, and handling graph components to effectively identify cycles and avoid stack overflow issues.

In this lesson, we’ll solve the problem of detecting cycles in directed graphs by implementing DFS.

Implementation notes

The easiest way to implement DFS is as a recursive procedure:

  1. Mark the current node as “in progress.”
  2. Recursively process all nodes behind discovery edges.
  3. Mark the current node as done.

Due to the recursion occuring between starting and finishing a node, we automatically explore deeper nodes first before going up to shallower nodes again.

So, in principle, our depth-first search procedure should be a recursive function

void dfs(int u) {}

that performs DFS from the starting node u.

As we mostly need to perform operations on all neighboring edges of a node, we’ll make use of the adjacency list graph representation. We are considering unweighted graphs here, but the implementation for weighted graphs is very similar. We’ll simply ignore the values of the weights.

But now our DFS procedure additionally needs to access some “global” state such as:

  • the adjacency list of the graph.
  • the current state or color of each node.

One option would be to pass this state to the dfs routine additionally, like in

void dfs(int u, vector<vector<int>> const &adjacencyList, vector<NodeState> &state) {}

However, this is unwieldy, especially when we want to keep track of more state during the search, such as node predecessors or distances.

Instead, we’ll implement our cycle detection algorithm as a class where this state is stored in the member variables.

Avoiding stack overflows

There is one downside of implementing DFS recursively. As each recursive function call creates a stack frame, it is possible to run into stack overflows on very large graphs.

%0 node_1 dfs(u1) node_2 dfs(u2) node_3 dfs(u3) node_1622885713399 ... node_1622885682682 dfs(u1000000) node_1622885667177 dfs(u1000001) top_node top node_1622885667177->top_node
Visualization of the call stack of a DFS execution where a high recursion depth can lead to stack overflows.

An easy workaround for this problem is to simply increase the maximum stack size, for example, using the ulimit -s command on Linux systems.

But some operating systems do not support raising the stack size past a hard limit. In that case, the only remedy is to implement DFS iteratively by keeping track of a vector of “in progress” nodes. This vector can be allocated on the heap and thus we can avoid stack overflow issues.

Implementing cycle detection with DFS

Let’s start by setting up a basic class to run the cycle detection.

C++ 17
#include <vector>
#include <iostream>
using namespace std;
// shorthand for adjacency list type
typedef vector<vector<int>> AdjacencyList;
// define an enum for node states during DFS execution
enum class NodeState {UNVISITED, IN_PROGRESS, FINISHED};
class CycleDetector {
private:
const AdjacencyList adjacencyList;
vector<NodeState> nodeStates;
bool cycleFound = false;
public:
CycleDetector(AdjacencyList _adjacencyList)
: adjacencyList {_adjacencyList}
// initialize all nodes as unvisited
, nodeStates {vector<NodeState>(adjacencyList.size(), NodeState::UNVISITED)}
{}
};

To keep track of the node states during DFS, we have defined the enumeration NodeState.

enum class NodeState {UNVISITED, IN_PROGRESS, FINISHED};

The states correspond to the colors white, gray, and blue from the previous lesson.

In our class, we use a vector to keep track of the current state of each node. Initially, all nodes are in the UNVISITED state. We also maintain a boolean variable cycleFound, initially false, which keeps track of whether we have discovered a cycle yet.

Next, let’s write the DFS routine.

C++
void dfs(int u) {
// mark current node as in progress
this->nodeStates[u] = NodeState::IN_PROGRESS;
for (int v : this->adjacencyList[u]) {
switch (this->nodeStates[v]) {
// discovery edge: recursively call dfs
case NodeState::UNVISITED: dfs(v); break;
// back edge: mark cycle as found
case NodeState::IN_PROGRESS: this->cycleFound = true; break;
// redundant edge: skip
case NodeState::FINISHED: break;
}
}
// mark current node as done
this->nodeStates[u] = NodeState::FINISHED;
}

After marking the current node u as IN_PROGRESS, we’ll check all of its outgoing edges. Our action will depend on the type of each edge:

  • For discovery edges leading to unvisited nodes, we’ll recursively call dfs.
  • For back edges leading to “in progress” nodes, we’ll note that we have found a cycle, but will not follow them.
  • We’ll ignore redundant edges leading to finished nodes.

Finally, we’ll mark the current node as FINISHED.

Now, we only need to write a function that uses dfs to discover whether there is a cycle anywhere in the graph.

C++
bool containsCycle() {
for (int u = 0; u < (int)this->adjacencyList.size(); ++u) {
// skip nodes that were already discovered
if (this->nodeStates[u] == NodeState::FINISHED) continue;
dfs(u);
}
return this->cycleFound;
}

Our containsCycle function calls dfs from every possible starting vertex in the graph. This is necessary because the graph might be disconnected. Trying dfs from one vertex might not find an existing cycle in another connected component.

However, to save time, we won’t consider vertices in the FINISHED state, as we have already explored them and all of their neighbors. In the end, we’ll simply return cycleFound. It was set to true only if we have ever found a cycle in the graph.

The following code snippet contains the complete implementation of our CycleDetector class again for reference.

C++ 17
#include <vector>
using namespace std;
typedef vector<vector<int>> AdjacencyList;
enum class NodeState {UNVISITED, IN_PROGRESS, FINISHED};
class CycleDetector {
private:
const AdjacencyList adjacencyList;
vector<NodeState> nodeStates;
bool cycleFound = false;
public:
CycleDetector(AdjacencyList _adjacencyList)
: adjacencyList {_adjacencyList}
, nodeStates {vector<NodeState>(adjacencyList.size(), NodeState::UNVISITED)}
{}
bool containsCycle() {
for (int u = 0; u < (int)this->adjacencyList.size(); ++u) {
if (this->nodeStates[u] == NodeState::FINISHED) continue;
dfs(u);
}
return this->cycleFound;
}
void dfs(int u) {
this->nodeStates[u] = NodeState::IN_PROGRESS;
for (int v : this->adjacencyList[u]) {
switch (this->nodeStates[v]) {
case NodeState::UNVISITED: dfs(v); break;
case NodeState::IN_PROGRESS: this->cycleFound = true; break;
case NodeState::FINISHED: break;
}
}
this->nodeStates[u] = NodeState::FINISHED;
}
};

Runtime analysis

Let’s analyze the time complexity of our DFS implementation. In the worst case, a single call to dfs might inspect every edge and explore the whole graph, which takes O(E)\mathcal{O}(|E|) time.

In our cycle detection, we’ll run DFS from all starting nodes that have not been previously visited. This is a common technique in graph traversal code. Since we only start subsequent DFS runs on unvisited nodes, we’ll still inspect each edge of the graph once. This entails that the total runtime of all these dfs calls is actually still O(E)\mathcal{O}(|E|).

However, since we’re also iterating over each node to check whether it was visited yet, we have an additional O(V)\mathcal{O}(|V|) work to do. The total complexity of exploring the whole graph using DFS is thus O(V+E)\mathcal{O}(|V|+|E|).

Note: In most graphs we have VE|V| \leq |E|, and thus O(V+E)=O(E)\mathcal{O}(|V|+|E|) = \mathcal{O}(|E|).

Trying out the cycle detection

Now, let’s verify that our implementation works. We’ll run it on our DFS example graph of the previous lesson.

g 1 1 3 3 1->3 2 2 2->1 0 0 3->0 0->1 0->2 4 4 4->1
Our DFS example graph.

We’ve already renamed the nodes to integer indices from zero to four.

The graph contains two cycles,

0 -> 2 -> 1 -> 3 -> 0
0 -> 1 -> 3 -> 0

so we expect our algorithm to return true.

C++ 17
int main() {
int n = 5;
AdjacencyList adjacencyList(n);
vector<pair<int, int>> edges {{0, 1}, {0, 2}, {1, 3}, {2, 1}, {3, 0}, {4, 1}};
for (auto [u, v] : edges) {
adjacencyList[u].push_back(v);
}
CycleDetector cycleDetector(adjacencyList);
cout << "Contains cycle: " << cycleDetector.containsCycle() << endl;
}

Now, let’s make a slight modification to the graph and remove the edge (1,3)(1, 3) (marked in red above). Since both cycles in the original graph are using that edge, our algorithm will now return false.

C++ 17
int main() {
int n = 5;
AdjacencyList adjacencyList(n);
vector<pair<int, int>> edges {{0, 1}, {0, 2}, {2, 1}, {3, 0}, {4, 1}};
for (auto [u, v] : edges) {
adjacencyList[u].push_back(v);
}
CycleDetector cycleDetector(adjacencyList);
cout << "Contains cycle: " << cycleDetector.containsCycle() << endl;
}

Detecting cycles in undirected graphs

The algorithm we implemented above only correctly detects cycles in directed graphs. For undirected graphs, we’ll need to modify our DFS implementation slightly to account for the fact that cycles in undirected graphs need to have at least three nodes.

In an undirected graph, if a discovery (half-)edge (u,v)(u, v) leads us to a node vv, then we should not count the symmetric half-edge (v,u)(v, u) as a back edge. Otherwise, every undirected edge would be counted as a cycle.

g a a b b a->b b->a u u v v u->v
Left: A two node cycle in a directed graph. Right: a single undirected edge is not a cycle.

Thankfully, we can fix that by making a small change to our implementation of dfs. Whenever a discovery edge (u,v)(u, v) leads to the node vv, let’s call the node uu the parent of vv. The starting node of the DFS has no parent.

In our implementation, we’ll then need to keep track of the parent, and never consider edges leading back to the parent as back edges. When there is no parent (in the starting node), we’ll use a dummy value of -1 for the parent.

C++ 17
// keep track of parent vertex, default -1 (no parent)
void dfs(int u, int parent = -1) {
this->nodeStates[u] = NodeState::IN_PROGRESS;
for (int v : this->adjacencyList[u]) {
// skip edge back to the parent
if (v == parent) continue;
switch (this->nodeStates[v]) {
// call dfs and pass the parent
case NodeState::UNVISITED: dfs(v, u); break;
case NodeState::IN_PROGRESS: this->cycleFound = true; break;
case NodeState::FINISHED: break;
}
}
this->nodeStates[u] = NodeState::FINISHED;
}

With the above modification, a single edge in an undirected graph is no longer considered a cycle.