Design add and search words data structure

Imagine you’re developing a word game application where users can input words, and the application needs to determine if a given string exists within the collection of entered words. Additionally, users may employ wildcard characters (‘.’) to represent any letter, allowing for flexible pattern matching.

Example:

Suppose a player adds words like “apple,” “banana,” and “cherry” to their word dictionary. During gameplay, the player enters the pattern “b..ana”, indicating a desire to find words similar to “banana” but with unspecified letters at certain positions. The word dictionary efficiently identifies “banana” as a match, allowing the player to score points and progress.

Problem statement

You are tasked with creating a specialized data structure, WordDictionary, to manage a collection of words efficiently. This structure should enable the addition of new words and the subsequent matching of input strings against the stored words. Each word may contain letters and dots (.), where dots can represent any single letter.

Your task is to implement the WordDictionary class, which should provide the following functionalities:

  • WordDictionary(): This initializes a new instance of the data structure.

  • void addWord(word): Adds the specified word to the collection within the data structure, enabling future matching against it.

  • bool search(word): Determines whether any word within the data structure matches the input string word. Matching involves comparing each word character against the stored words, where dots in a word can match any letter.

Example usage

word_dict = WordDictionary()
word_dict.addWord("bad")
word_dict.addWord("dad")
word_dict.addWord("mad")
print(word_dict.search("pad")) # Output: False
print(word_dict.search("bad")) # Output: True
print(word_dict.search(".ad")) # Output: True
print(word_dict.search("b..")) # Output: True

Knowledge test

Test your understanding of the concepts above by taking the quiz below!

Choose the correct answer.

1

What type of search algorithm is typically used to find matches in the word dictionary?

A)

Linear search

B)

Breadth-first search

C)

Depth-first search

Question 1 of 30 attempted

Algorithm

To solve this problem we will use the concept of a trie. A trie, also known as a prefix tree, is a tree-like data structure that stores a dynamic set of strings to allow for efficient prefix-based searching. Each node in the trie represents a single character of the string.

A trie which stores the words and, ant, dad, do
A trie which stores the words and, ant, dad, do

In the trie given above:

  • Each level represents a character in the word.

  • The path from the root to a node represents a word.

  • Nodes with isEndOfWord set to True indicate the end of a valid word.

Note: To learn about tries, refer to the What is a prefix tree? Answer.

The WordDictionary class utilizes a trie structure. Each node of the trie consists of:

  • children: This is an array to store references to child nodes, representing all possible characters (‘a’ to ‘z’).

  • isEndOfWord: This is a boolean flag indicating whether the node represents the end of a valid word.

Algorithm steps

  1. Initialization:

    1. Initialize the trie structure in the constructor of the WordDictionary class.

  2. Adding a word:

    1. To add a word to the dictionary:

      1. Iterate through each character of the word.

      2. If the current character’s corresponding child node does not exist, create a new node and link it.

      3. Set the isEndOfWord flag for the last character to mark the end of a word.

  3. Searching for a word:

    1. To search for a word in the dictionary:

      1. Perform a Breadth-first search (BFS) traversal through the trie.

      2. Use a deque to store tuples of (current node, index) pairs.

      3. For each character in the word:

        1. If it’s a wildcard ‘.’, explore all possible child nodes.

        2. If it’s a specific character, follow the corresponding child node.

      4. If the search reaches the end of the word and the isEndOfWord flag is set, return True.

      5. If no valid word is found, return False.

Coding example

Let’s implement our algorithm in Python!

from collections import deque
class WordDictionary(object):
def __init__(self):
self.children = [None] * 26
self.isEndOfWord = False
def addWord(self, word):
curr = self
for char in word:
c = ord(char) - ord('a')
if curr.children[c] is None:
curr.children[c] = WordDictionary()
curr = curr.children[c]
curr.isEndOfWord = True
def search(self, word):
curr = self
q = deque([(curr, 0)])
wsize = len(word)
while q:
curr, idx = q.popleft()
if idx == wsize:
if curr.isEndOfWord:
return True
continue
c = word[idx]
if c == '.':
for child in curr.children:
if child is not None:
q.append((child, idx + 1))
else:
curr = curr.children[ord(c) - ord('a')]
if curr is not None:
q.append((curr, idx + 1))
return False
# Test case
obj = WordDictionary()
obj.addWord("bad")
obj.addWord("dad")
obj.addWord("mad")
print(obj.search("pad")) # False
print(obj.search("bad")) # True
print(obj.search(".ad")) # True
print(obj.search("b..")) # True

Code explanation

The step-by-step explanation is as follows:

  • Lines 4–6: We are defining the initializing function for the WordDictionary object, where our class contains two attributes; self.children which is an array of size 26, initialized with None, to represent child nodes for each character from ‘a’ to ‘z’ and self.isEndOfWord which is a boolean flag initialized as False, indicating whether the current node represents the end of a valid word.

  • Line 8: We are defining a method addWord that takes a parameter word to add that word to the dictionary.

  • Lines 10–11: We iterate through each character (char) in the input word and calculate the index c in the children array based on the ASCII value of the character.

  • Lines 12–14: We check if the child node for the current character doesn’t exist, and if it doesn’t, then we create a new WordDictionary node, link it, and move to the next child node.

  • Line 15: We set the isEndOfWord flag for the last character of the word to mark the end of a valid word.

  • Lines 17: We define a method search with a parameter word to search for that word in the dictionary.

  • Lines 18: We initialize a variable curr to reference the current instance of the WordDictionary class. This will be used to traverse the trie.

  • Line 19: We initialize a deque q with a single tuple (curr, 0). This tuple contains the current node (curr) and the index 0, indicating the starting point of the word.

  • Line 20: We calculate the length of the input word and assign it to the variable wsize. This will determine when the entire word has been processed during the search.

  • Lines 21: We initiate a while loop that continues as long as the deque q is not empty, i.e., as long as there are nodes to explore in the trie.

  • Line 22: We dequeue a tuple (curr, idx) from the left side of the deque q. curr represents the current node being processed and idx represents the index of the character in the word being examined.

  • Lines 23–26: We check if the index idx is equal to the length of the word wsize. If it is, then we check if the current node curr represents the end of a valid word (curr.isEndOfWord). If it does, it returns True. Otherwise, we continue to the next iteration of the while loop to process the next node.

  • Lines 27–31: We retrieve the character at the index idx from the input word word and assign it to the variable c. Next, we check if the character c is a wildcard (‘.’). If it is, then we iterate through all children of the current node curr. For each non-null child, we enqueue a tuple (child, idx + 1) into the deque q, indicating the child node and the following index in the word.

  • Lines 32–36: After checking that c is not a wildcard, we update the current node curr to be the child node corresponding to the character c and enqueue a tuple (curr, idx + 1) into the deque q to explore the next character in the word. Lastly, if no valid word is found, we return False.

Time complexity

The time complexity for adding a word to the WordDictionary is O(M)O(M), where MM is the length of the word being added. This is because we iterate through each character of the word and perform constant-time operations to navigate the trie.

The time complexity for searching a word in the WordDictionary is O(NM)O(N*M), where NN is the number of characters in the word being searched and MM is the maximum depth of the trie. This is because, in the worst case, we might need to traverse down each branch of the trie for each character in the search word.

Space complexity

The space complexity for the WordDictionary class is O(NM)O(N*M), where NN is the number of words inserted and MM is the average length of the words. This is because we store each character of each word in a trie structure, consuming space proportional to the total number of characters in all words.

Copyright ©2024 Educative, Inc. All rights reserved