Understanding data structures and algorithms is crucial for problem-solving and optimization in computer science. They form the foundation for writing efficient and scalable code, vital for software development and technical interviews.
No matter which programming language you learn, data structures and algorithms (DSA) form the backbone of computer science. Understanding these concepts is crucial for efficient problem-solving and writing optimized code.
This comprehensive DSA roadmap will guide you through the essential topics and provide a detailed approach toward their mastery.
Data structures are ways to organize and store data in a computer to be accessed and modified efficiently.
Algorithms are step-by-step procedures or formulas for solving problems. They perform operations on data structures to solve computational problems.
Data structures and algorithms are a non-negotiable to learning to code. The better you understand them, the better you can:
Write efficient code that runs faster and uses fewer resources
Crack coding interviews at top tech companies
Have a deeper understanding of how the software works (and strong problem-solving skills to match
Data structures are amongst the most fundamental concepts of Computer Science. The data structure chosen can make or break an entire computer program. Consequently, they are also largely categorized as a vital benchmark of computer science knowledge when it comes to industry interviews. This course contains a detailed review of all the common data structures and provides implementation level details in Python to allow readers to become well equipped. Now with more code solutions, lessons, and illustrations than ever, this is the course for you!
Once you have a good understanding of a programming language and knowledge of time and space complexities, you're ready to start learning about data structures.
Choose a programming language to learn the basics:
Variables: Learn how to declare and use variables to store data.
Operators: Understand arithmetic, logical, and comparison operators.
Conditional statements: Explore how to execute different blocks of code based on specific conditions.
Loops: Learn about loops that allow performing certain operations repeatedly. Most programming languages offer multiple types of loop statements.
Functions/Methods: Understand how to define and call functions or methods for code reusability.
Scoping: Learn about variable scopes like local, global, and block scopes.
Next, learn about time and space complexities:
Time complexity: Measure how long it takes for an algorithm to be executed, focusing on worst-case scenarios (Big (O) notation), best-case scenarios (Omega notation), and average complexity (Theta notation).
Space complexity: Analyze the amount of memory space an algorithm requires, optimizing memory usage and avoiding excessive memory consumption.
There are various data structures, but today, we'll discuss the following:
Arrays
Linked lists
Stacks
Queues
Trees
Graphs
Arrays are one of the simplest data structures, consisting of a collection of elements stored in contiguous memory locations. They provide constant-time access to elements based on their indexes, making them ideal for scenarios where random access is frequent. Arrays are widely used for implementing lists, matrices, and dynamic arrays.
Access:
Search:
Insertion/Deletion (at end):
Insertion/Deletion (in middle):
Linked lists are dynamic data structures where elements, called nodes, are linked using pointers. Unlike arrays, linked lists allow for efficient insertion and deletion operations at any position. However, they have slower access times than arrays since a traversal is necessary to reach specific elements.
Below are the different types of linked lists: Singly linked lists, doubly linked lists, and circular linked lists.
A singly linked list is a data structure that consists of a sequence of elements called nodes. Each node contains two parts:
Data: The value or information stored in the node
Next: A reference (or link) to the next node in the sequence
In a singly linked list, nodes are linked linearly, where each node points to the next node in the sequence. The last node’s next pointer points to a special sentinal value, often termed NULL, indicating the end of the list.
A doubly linked list is a data structure similar to a singly linked list but with an additional link to the previous node. Each node contains three parts:
Data: The value or information stored in the node
Next: A reference (or link) to the next node in the sequence
Previous: A reference (or link) to the previous node in the sequence
This allows traversal of the list in both forward and backward directions. The first node’s previous
link and the last node’s next
link are typically NULL.
A circular linked list is a variation of the linked list where the last node points back to the first node, forming a circle. Circular linked lists can be singly or doubly linked. In a singly circular linked list, each node has:
Data: The value or information stored in the node
Next: A reference (or link) to the next node in the sequence
Access:
Search:
Insertion/Deletion (at beginning):
Insertion/Deletion (in middle):
Insertion/Deletion (at the end of a doubly linked list):
Below is the implementation of the circular linked list in Python.
class Node:def __init__(self, data=None):self.data = dataself.next = Noneclass CircularLinkedList:def __init__(self):self.head = Nonedef is_empty(self):return self.head is Nonedef append(self, data):new_node = Node(data)if self.is_empty():self.head = new_nodenew_node.next = self.headreturncurrent = self.headwhile current.next != self.head:current = current.nextcurrent.next = new_nodenew_node.next = self.headdef prepend(self, data):new_node = Node(data)if self.is_empty():self.head = new_nodenew_node.next = self.headreturncurrent = self.headwhile current.next != self.head:current = current.nextcurrent.next = new_nodenew_node.next = self.headself.head = new_nodedef delete(self, key):if self.is_empty():returncurrent = self.headprev = Nonewhile True:if current.data == key:if prev:prev.next = current.nextelse:while current.next != self.head:current = current.nextcurrent.next = self.head.nextself.head = self.head.nextreturnprev = currentcurrent = current.nextif current == self.head:breakdef print_list(self):if self.is_empty():print("List is empty")returncurrent = self.headwhile True:print(current.data, end=" -> ")current = current.nextif current == self.head:breakprint("HEAD")# Example usagedef main():cll = CircularLinkedList()print("Initial list:")cll.print_list()print("\nAppending 1 to the list:")cll.append(1)cll.print_list()print("\nAppending 2 to the list:")cll.append(2)cll.print_list()print("\nPrepending 0 to the list:")cll.prepend(0)cll.print_list()print("\nDeleting 1 from the list:")cll.delete(1)cll.print_list()if __name__ == "__main__":main()
Stacks follow the Last In, First Out (LIFO) principle, where elements are added and removed from the top. As a linear data structure, only the top element can be accessed directly, ensuring LIFO order.
They are commonly used for managing function calls, expression evaluation, and undo mechanisms.
A stack is usually implemented using a linked list, where operations have the following time complexities:
Push (Add an element to the top):
Pop (Remove an element from the top):
Peek (Get the top element without removing):
isEmpty (Check if stack is empty):
Queues adhere to the First In, First Out (FIFO) order, where elements are added at the rear and removed from the front. They are useful for scenarios such as task scheduling, message passing, and managing requests.
Enqueue (Add element to rear):
Dequeue (Remove element from front):
Front (Get front element without removing):
isEmpty (Check if queue is empty):
To learn more about stack and queue, read our blog How to use stacks and queues.
Trees are hierarchical data structures consisting of nodes connected by edges. They have a root node at the top and child nodes branching out. Trees are versatile and used in various applications, such as representing hierarchical data (file systems, organizational charts) and implementing search algorithms (binary search trees).
Below are the different types of trees: Binary trees, binary search trees, AVL trees, heaps, and tries.
A binary tree is a tree data structure in which each node has at most two children, referred to as the left child and the right child.
Structure:
Each node contains a value, a reference to the left child, and a reference to the right child.
The tree starts with a root node, and each node may have zero, one, or two children.
Advantages:
It has a simple structure.
It helps to represent the hierarchical relationships between data elements.
A binary search tree (BST) is a binary tree in which each node has a value greater than all the values in its left subtree and less than all the values in its right subtree.
Structure:
It’s same to a binary tree but with the added property of ordered nodes.
It ensures that for any node, the values in the left subtree are less than the node, and those in the right subtree are greater than the node.
Advantages:
It’s efficient in search, insertion, and deletion operations, provided that the binary search tree is balanced.
It maintains order, which allows for an easy range of queries.
Below is the implementation of the binary search tree in Python.
class TreeNode:def __init__(self, key):self.left = Noneself.right = Noneself.val = keyclass BST:def __init__(self):self.root = Nonedef insert(self, key):if not self.root:self.root = TreeNode(key)else:self._insert_recursive(self.root, key)def _insert_recursive(self, node, key):if key < node.val:if not node.left:node.left = TreeNode(key)else:self._insert_recursive(node.left, key)else:if not node.right:node.right = TreeNode(key)else:self._insert_recursive(node.right, key)def search(self, key):return self._search_recursive(self.root, key)def _search_recursive(self, node, key):if not node or node.val == key:return nodeif key < node.val:return self._search_recursive(node.left, key)else:return self._search_recursive(node.right, key)def print_inorder(self):self._print_inorder_recursive(self.root)print()def _print_inorder_recursive(self, node):if node:self._print_inorder_recursive(node.left)print(node.val, end=" ")self._print_inorder_recursive(node.right)# Example usagedef main():bst = BST()print("Initial BST (in-order traversal):")bst.print_inorder()print("\nInserting 50 into the BST:")bst.insert(50)bst.print_inorder()print("\nInserting 30 into the BST:")bst.insert(30)bst.print_inorder()print("\nInserting 20 into the BST:")bst.insert(20)bst.print_inorder()print("\nInserting 40 into the BST:")bst.insert(40)bst.print_inorder()print("\nInserting 70 into the BST:")bst.insert(70)bst.print_inorder()print("\nInserting 60 into the BST:")bst.insert(60)bst.print_inorder()print("\nInserting 80 into the BST:")bst.insert(80)bst.print_inorder()search_key = 60result = bst.search(search_key)if result:print(f"\nFound node with value {result.val}")else:print("\nKey not found")if __name__ == "__main__":main()
An AVL tree is a self-balancing binary search tree where the difference in heights between the left and right subtrees of any node is at most one.
Structure:
It’s same to a binary search tree but with an additional balance property.
While some implementations store the balance factor in each node, it can also be computed dynamically as needed. A node’s balance factor is the difference in height between its left and right subtrees.
Advantages:
It ensures balanced height, leading to
It prevents worst-case scenarios found in unbalanced BSTs.
A heap is a special tree-based data structure that satisfies the heap property. There are two types of heaps: Min-heap and max-heap.
Structure:
It is typically implemented as a binary tree.
It is often stored in an array for efficient access and manipulation.
Advantages:
It is used in efficient implementation of priority queues.
It provides quick access to the minimum or maximum element.
Min-heap: In a min-heap, the value of each node is greater than or equal to the value of its parent. The minimum value is always located at the root node.
Max-heap: In a max-heap, the value of each node is less than or equal to the value of its parent. The maximum value is always located at the root node.
Insertion:
Deletion (Root element):
Peek (Find minimum in a min-heap or maximum in a max-heap):
A trie is a tree-like data structure used to store a dynamic set of strings, where the keys are usually strings.
Structure:
Each node represents a single character of a string.
The root represents the empty string, and each path from the root to a leaf represents a string in the set.
Advantages:
It is efficient for prefix-based searches, such as autocomplete and spell checking.
It provides
Insert (Insert a word in trie):
Search(Search a word in trie):
Delete (Delete a word from trie):
Prefix search (If any word in the trie starts with the given prefix):
Graphs are nonlinear data structures with vertices (nodes) connected by edges. They are used to model relationships, networks, and complex systems. Graphs can be directed (edges have a direction) or undirected (edges have no direction).
Below are the different types of graphs: Directed graphs, undirected graphs, weighted graphs, cyclic graphs, and acyclic graphs.
A directed graph (digraph) is one in which the edges have a direction. Each edge goes from one vertex to another (or the same vertex).
Structure:
It consists of vertices (nodes) and directed edges (arcs).
Each edge is represented as an ordered pair (u, v), indicating a direction from vertex u to vertex v.
Advantages:
It is suitable for representing relationships with direction, such as web links, one-way streets, or precedence relationships.
An undirected graph is a graph in which the edges have no direction. Each edge connects two vertices without a specific direction.
Structure:
It consists of vertices and undirected edges.
Each edge is represented as an unordered pair {u, v}, indicating a bidirectional relationship between u and v.
Advantages:
It’s simpler traversal algorithms since edges have no direction.
It’s suitable for representing mutual relationships, such as friendships in a social network.
A weighted graph is a graph in which each edge has a numerical value (weight) associated with it. The weight can represent distance, cost, or any other quantitative measure.
Structure:
It consists of vertices and weighted edges.
Each edge is represented as a tuple (u, v, w) where w is the weight of the edge between vertices u and v.
Advantages:
It is useful for optimization problems, such as the shortest path or minimum spanning tree.
It can represent more complex relationships compared to unweighted graphs.
An acyclic graph is a graph that does not contain any cycles. In the case of directed graphs, an acyclic graph is specifically called a Directed Acyclic Graph (DAG).
Structure:
It consists of vertices and edges with no subset of edges forming a cycle.
Advantages:
It is simpler traversal and pathfinding algorithm because there are no cycles.
DAGs are particularly useful for representing processes with dependencies, such as task scheduling or prerequisite relationships.
Graphs are usually represented as either an adjacency list or an adjacency matrix to solve graph-related problems.
Adjacency list: It’s a collection of lists where each list corresponds to a node and contains its neighbors. In the case of weighted graphs, the weight is stored along with the corresponding nodes.
Adjacency matrix: It’s a 2D array where each cell,
To learn more about data structures, we recommend the course “Data Structures for Coding Interviews.” This course is offered in five languages: Python, C++, Java, C#, and JavaScript.
Once you understand data structures well, learning about algorithms is essential. Algorithms help solve complex problems efficiently and are the basis for creating effective and optimized solutions in computer science.
Today we'll discuss:
Searching algorithms
Sorting algorithms
Graph algorithms
Dynamic programming
Learn introductory computer science algorithms, including searching, sorting, recursion, and graph theory through a combination of articles, visualizations, quizzes, and coding challenges. Implement Challenges in Java, Python, C++ or Javascript.
Searching algorithms are fundamental in computer science and practical applications. They allow us to efficiently locate specific elements within a data structure. Some key reasons for their importance include:
Data retrieval: Quick data retrieval is crucial in applications such as databases, search engines, and file systems.
Optimization: Efficient searching algorithms can significantly reduce the time complexity of operations, enhancing the performance of software applications.
Foundation for other algorithms: Many advanced algorithms, such as certain dynamic programming solutions and graph algorithms, rely on efficient searching techniques as a core component.
Real-world applications: Searching algorithms are ubiquitous in everyday technology, from finding a contact in your phonebook to searching for a product in an online store.
Linear search iteratively checks each element in a list until the target element is found or the end of the list is reached. It is simple but less efficient for large datasets. The time complexity of the linear search is
Binary search is an efficient algorithm for finding a target value within a sorted list. It repeatedly divides the search interval in half by comparing the target value to the middle element. The search is complete if the target is equal to the middle element. If the target is less than the middle element, the search continues on the left subarray. If the target is greater, it continues on the right subarray. This process continues until the target is found or the search interval is empty. Binary search has a worst-case time complexity of
Sorting algorithms arrange elements in a specific order (usually ascending or descending), which is essential for many applications. Their importance includes:
Efficiency: Many algorithms, like binary search, require sorted data to function correctly and efficiently.
Data analysis: Sorted data is easier to analyze and visualize, which is important for tasks like statistical analysis and reporting.
Simplifying problem-solving: Some complex problems become easier to solve when data is sorted, such as finding duplicates or detecting patterns.
Improving other algorithms: Sorting is often a preprocessing step for other algorithms, improving their performance and reliability. For example, sorting before searching or merging intervals.
Everyday use: Sorting is commonly used in practical applications, such as organizing emails, files, or playlist tracks.
Bubble sort is a simple sorting algorithm that repeatedly steps through the list, comparing adjacent elements and swapping them if they are in the wrong order. This process is repeated for each element in the list. Each pass through the list ensures the next largest element moves to its correct position. The algorithm continues until no swaps are needed, indicating the list is sorted. Although easy to understand, bubble sort is inefficient for large lists with a time complexity of
Merge sort is an efficient, stable sorting algorithm that uses the divide-and-conquer strategy. It recursively breaks down a list into two equal halves until each sublist contains only one element. Since a single-element list is inherently sorted, the algorithm then begins the process of merging these sublists back together. During the merging phase, it compares the smallest elements of each sublist and combines them in sorted order. This process continues recursively until all sublists are merged into one sorted list. Merge sort is particularly efficient with a time complexity of
Below is the implementation of the merge sort algorithm in Python.
def merge_sort(arr):if len(arr) > 1:mid = len(arr) // 2left_half = arr[:mid]right_half = arr[mid:]merge_sort(left_half)merge_sort(right_half)i = j = k = 0while i < len(left_half) and j < len(right_half):if left_half[i] < right_half[j]:arr[k] = left_half[i]i += 1else:arr[k] = right_half[j]j += 1k += 1while i < len(left_half):arr[k] = left_half[i]i += 1k += 1while j < len(right_half):arr[k] = right_half[j]j += 1k += 1return arr# Example usagearr = [64, 34, 25, 12, 22, 11, 90]print("Array before sorting:", arr)sorted_arr = merge_sort(arr)print("Sorted array:", sorted_arr)
Graph algorithms are essential for solving problems related to networks and relationships. Their significance includes:
Network analysis: They help analyze and optimize networks, such as social networks, computer networks, transportation systems, and utility grids.
Connectivity: Determining connectivity and identifying components within a graph is crucial for understanding the structure and robustness of networks.
Optimization: Graph algorithms are used in optimization problems, such as finding the minimum spanning tree or maximum flow in a network.
Modeling real-world problems: Many real-world problems can be modeled as graphs, making graph algorithms a powerful tool for finding solutions in diverse fields like biology, sociology, linguistics, and more.
Depth-first search (DFS) is a graph traversal algorithm that explores the vertices of a graph as deeply as possible along each branch before backtracking. Here’s a more detailed explanation:
Starting point: DFS begins at a chosen starting vertex and explores as far as possible along each branch before backtracking. It uses a stack data structure, either explicitly or through recursion, to keep track of the vertices to be explored.
Exploration process: From the starting vertex, DFS visits an adjacent unvisited vertex, marks it as visited, and then recursively applies the same process to this new vertex. If no adjacent unvisited vertices are found, it backtracks to the previous vertex to continue the search.
Backtracking: If a vertex has no unvisited adjacent vertices, DFS backtracks to the most recent vertex that still has unexplored neighbors. This process continues until all vertices connected to the starting vertex are visited.
DFS is used for traversing trees, finding connected components in undirected graphs, and solving maze problems by exploring all paths and backtracking when necessary. It has a time complexity of
Breadth-first search (BFS) is a graph traversal algorithm that explores vertices level by level, starting from the root node or any specified node. Here’s a more detailed explanation:
Level-wise exploration: BFS explores all vertices at the present "depth" (or level) before moving on to vertices at the next level. It uses a queue data structure to maintain the order of exploration, starting with the root node and then exploring its neighbors, followed by their neighbors, and so on.
Process:
Begin with the root node and enqueue it.
Dequeue a vertex, mark it as visited, and enqueue all its adjacent vertices that have not yet been visited.
Continue this process until all vertices reachable from the initial node have been visited.
BFS is ideal for finding the shortest paths in unweighted graphs by exploring level by level, useful for puzzles like mazes. It is also employed in network traversal, such as social networks or web crawlers, to discover all reachable nodes with a time complexity of
Below is the implementation of the graph breadth-first search algorithm in Python.
from collections import dequeclass Graph:def __init__(self):self.graph = {}def add_edge(self, u, v):if u not in self.graph:self.graph[u] = []self.graph[u].append(v)def print_graph(self):for node, neighbors in self.graph.items():print(f"{node}: {', '.join(map(str, neighbors))}")def bfs(self, start):visited = set()queue = deque([start])visited.add(start)while queue:v = queue.popleft()print(v, end=' ')for neighbor in self.graph.get(v, []):if neighbor not in visited:queue.append(neighbor)visited.add(neighbor)# Example usageg = Graph()g.add_edge(0, 1)g.add_edge(0, 2)g.add_edge(1, 2)g.add_edge(2, 0)g.add_edge(2, 3)g.add_edge(3, 3)print("Graph representation:")g.print_graph()print("\nBFS starting from vertex 2:")g.bfs(2)
Dynamic programming is a powerful optimization technique used to improve the efficiency of recursive algorithms by storing the results of overlapping subproblems and reusing them when needed, rather than recalculating them. This approach is particularly effective for problems that exhibit optimal substructure and overlapping subproblems. Optimal substructure means that the optimal solution to the problem can be constructed from optimal solutions to its subproblems.
To learn more about dynamic programming, read our blog: From recursion to dynamic programming.
The illustration below calculates each fibonacci number by summing the two preceding numbers. This results in a tree of recursive calls, where many subproblems are solved multiple times.
Note that
Dynamic programming is widely used in various applications such as computing the Fibonacci sequence, where each term is the sum of the two preceding ones; finding the longest common subsequence in sequences, which involves finding the longest sequence that can be derived from both original sequences without changing the order of characters. By breaking down these problems into simpler subproblems and storing their solutions, dynamic programming significantly reduces computation time and avoids redundant calculations.
To further increase your knowledge and skills in algorithms, we recommend the course “Algorithms for Coding Interviews.” This course is offered in three languages: Python, C++, and Java.
By following a structured roadmap and working on practical projects, you can build a strong foundation with data structures and algorithms.
Foundations: Start with basic data structures like arrays, linked lists, and stacks. Understand their properties, operations, and time complexities.
Trees and graphs: Learn about binary trees, binary search trees, heaps, and graph representations. Practice tree traversal algorithms (inorder, preorder, postorder) and graph algorithms (DFS, BFS).
Sorting and searching: Dive into sorting algorithms (bubble sort, merge sort, quick sort) and searching techniques (linear search, binary search). Compare their efficiencies and understand when to use each.
Dynamic programming: Master dynamic programming concepts by solving classic problems like the 0/1 knapsack problem, longest common subsequence, and optimal matrix chain multiplication.
Advanced topics: Explore advanced data structures (tries, segment trees, hash tables) and algorithmic techniques (divide and conquer, greedy algorithms, backtracking). Solve challenging problems and work on projects requiring algorithmic optimization.
Projects: Apply your knowledge by working on projects that require implementing data structures and algorithms, such as building a search engine, designing a recommendation system, or optimizing a sorting algorithm for large datasets.
Mastering data structures and algorithms is essential to your long-term success as a coder. It's a continuous learning process that requires dedication, practice, and problem-solving skills.
At Educative, we offer various hands-on courses that cover these programming building blocks in various languages. Check them out below:
Happy coding and exploring the world of data structures and algorithms!
Free Resources