Implementation of DFS

Explore how to implement depth-first search with C++ to detect cycles in directed and undirected graphs. Understand node states, recursive DFS, and handling graph components to effectively identify cycles and avoid stack overflow issues.

We'll cover the following...

Implementation notes

Avoiding stack overflows

Implementing cycle detection with DFS

Runtime analysis
Trying out the cycle detection
Detecting cycles in undirected graphs

In this lesson, we’ll solve the problem of detecting cycles in directed graphs by implementing DFS.

Implementation notes

The easiest way to implement DFS is as a recursive procedure:

Mark the current node as “in progress.”
Recursively process all nodes behind discovery edges.
Mark the current node as done.

Due to the recursion occuring between starting and finishing a node, we automatically explore deeper nodes first before going up to shallower nodes again.

So, in principle, our depth-first search procedure should be a recursive function

void dfs(int u) {}

that performs DFS from the starting node u.

As we mostly need to perform operations on all neighboring edges of a node, we’ll make use of the adjacency list graph representation. We are considering unweighted graphs here, but the implementation for weighted graphs is very similar. We’ll simply ignore the values of the weights.

But now our DFS procedure additionally needs to access some “global” state such as:

the adjacency list of the graph.
the current state or color of each node.

One option would be to pass this state to the dfs routine additionally, like in

void dfs(int u, vector<vector<int>> const &adjacencyList, vector<NodeState> &state) {}

However, this is unwieldy, especially when we want to keep track of more state during the search, such as node predecessors or distances.

Instead, we’ll implement our cycle detection algorithm as a class where this state is stored in the member variables.

Avoiding stack overflows

There is one downside of implementing DFS recursively. As each recursive function call creates a stack frame, it is possible to run into stack overflows on very large graphs.

An easy workaround for this problem is to simply increase the maximum stack size, for example, using the ulimit -s command on Linux systems.

But some operating systems do not support raising the stack size past a hard limit. In that case, the only remedy is to implement DFS iteratively by keeping track of a vector of “in progress” nodes. This vector can be allocated on the heap and thus we can avoid stack overflow issues.

Implementing cycle detection with DFS

Let’s start by setting up a basic class to run the cycle detection.

C++ 17

#include <vector>
#include <iostream>
using namespace std;
// shorthand for adjacency list type
typedef vector<vector<int>> AdjacencyList;
// define an enum for node states during DFS execution
enum class NodeState {UNVISITED, IN_PROGRESS, FINISHED};
class CycleDetector {
private:
    const AdjacencyList adjacencyList;
    vector<NodeState> nodeStates;
    bool cycleFound = false;
public: 
    CycleDetector(AdjacencyList _adjacencyList) 
        : adjacencyList {_adjacencyList} 
        // initialize all nodes as unvisited
        , nodeStates {vector<NodeState>(adjacencyList.size(), NodeState::UNVISITED)}
    {}
};

To keep track of the node states during DFS, we have defined the enumeration NodeState.

enum class NodeState {UNVISITED, IN_PROGRESS, FINISHED};

The states correspond to the colors white, gray, and blue from the previous lesson.

In our class, we use a vector to keep track of the current state of each node. Initially, all nodes are in the UNVISITED state. We also maintain a boolean variable cycleFound, initially false, which keeps track of whether we have discovered a cycle yet.

Next, let’s write the DFS routine.

C++

void dfs(int u) {
    // mark current node as in progress
    this->nodeStates[u] = NodeState::IN_PROGRESS;
    for (int v : this->adjacencyList[u]) {
        switch (this->nodeStates[v]) {
            // discovery edge: recursively call dfs
            case NodeState::UNVISITED: dfs(v); break;
            // back edge: mark cycle as found
            case NodeState::IN_PROGRESS: this->cycleFound = true; break;
            // redundant edge: skip
            case NodeState::FINISHED: break;
        }
    }
    // mark current node as done 
    this->nodeStates[u] = NodeState::FINISHED;
}

After marking the current node u as IN_PROGRESS, we’ll check all of its outgoing edges. Our action will depend on the type of each edge:

For discovery edges leading to unvisited nodes, we’ll recursively call dfs.
For back edges leading to “in progress” nodes, we’ll note that we have found a cycle, but will not follow them.
We’ll ignore redundant edges leading to finished nodes.

Finally, we’ll mark the current node as FINISHED.

Now, we only need to write a function that uses dfs to discover whether there is a cycle anywhere in the graph.

Our containsCycle function calls dfs from every possible starting vertex in the graph. This is necessary because the graph might be disconnected. Trying dfs from one vertex might not find an existing cycle in another connected component.

However, to save time, we won’t consider vertices in the FINISHED state, as we have already explored them and all of their neighbors. In the end, we’ll simply return cycleFound. It was set to true only if we have ever found a cycle in the graph.

The following code snippet contains the complete implementation of our CycleDetector class again for reference.

C++ 17

#include <vector>
using namespace std;
typedef vector<vector<int>> AdjacencyList;
enum class NodeState {UNVISITED, IN_PROGRESS, FINISHED};
class CycleDetector {
private:
    const AdjacencyList adjacencyList;
    vector<NodeState> nodeStates;
    bool cycleFound = false;
public: 
    CycleDetector(AdjacencyList _adjacencyList) 
        : adjacencyList {_adjacencyList} 
        , nodeStates {vector<NodeState>(adjacencyList.size(), NodeState::UNVISITED)}
    {}
    bool containsCycle() {
        for (int u = 0; u < (int)this->adjacencyList.size(); ++u) {
            if (this->nodeStates[u] == NodeState::FINISHED) continue;
            dfs(u);
        }
        return this->cycleFound;
    }
    void dfs(int u) {
        this->nodeStates[u] = NodeState::IN_PROGRESS;
        for (int v : this->adjacencyList[u]) {
            switch (this->nodeStates[v]) {
                case NodeState::UNVISITED: dfs(v); break;
                case NodeState::IN_PROGRESS: this->cycleFound = true; break;
                case NodeState::FINISHED: break;
            }
        }
        this->nodeStates[u] = NodeState::FINISHED;
    }
};

Runtime analysis

Let’s analyze the time complexity of our DFS implementation. In the worst case, a single call to dfs might inspect every edge and explore the whole graph, which takes $\mathcal{O}(|E|)$ time.

In our cycle detection, we’ll run DFS from all starting nodes that have not been previously visited. This is a common technique in graph traversal code. Since we only start subsequent DFS runs on unvisited nodes, we’ll still inspect each edge of the graph once. This entails that the total runtime of all these dfs calls is actually still $\mathcal{O}(|E|)$ .

However, since we’re also iterating over each node to check whether it was visited yet, we have an additional $\mathcal{O}(|V|)$ work to do. The total complexity of exploring the whole graph using DFS is thus $\mathcal{O}(|V|+|E|)$ .

Note: In most graphs we have $|V| \leq |E|$ , and thus $\mathcal{O}(|V|+|E|) = \mathcal{O}(|E|)$ .

Trying out the cycle detection

Now, let’s verify that our implementation works. We’ll run it on our DFS example graph of the previous lesson.

Detecting cycles in undirected graphs

The algorithm we implemented above only correctly detects cycles in directed graphs. For undirected graphs, we’ll need to modify our DFS implementation slightly to account for the fact that cycles in undirected graphs need to have at least three nodes.

In an undirected graph, if a discovery (half-)edge $(u, v)$ leads us to a node $v$ , then we should not count the symmetric half-edge $(v, u)$ as a back edge. Otherwise, every undirected edge would be counted as a cycle.

C++ 17

// keep track of parent vertex, default -1 (no parent)
void dfs(int u, int parent = -1) {
        this->nodeStates[u] = NodeState::IN_PROGRESS;
        for (int v : this->adjacencyList[u]) {
            // skip edge back to the parent
            if (v == parent) continue;
            switch (this->nodeStates[v]) {
                // call dfs and pass the parent
                case NodeState::UNVISITED: dfs(v, u); break;
                case NodeState::IN_PROGRESS: this->cycleFound = true; break;
                case NodeState::FINISHED: break;
            }
        }
        this->nodeStates[u] = NodeState::FINISHED;
    }

1.Introduction

2.Graph Representations

3.Graph Traversal

4.Shortest Paths

5.Spanning Trees

6.Flow Problems

7.Conclusion

8.Appendix

Implementation of DFS

Implementation notes

Avoiding stack overflows

Implementing cycle detection with DFS

Runtime analysis

Trying out the cycle detection

Detecting cycles in undirected graphs