Design add and search words data structure
Imagine you’re developing a word game application where users can input words, and the application needs to determine if a given string exists within the collection of entered words. Additionally, users may employ wildcard characters (‘.’) to represent any letter, allowing for flexible pattern matching.
Example:
Suppose a player adds words like “apple,” “banana,” and “cherry” to their word dictionary. During gameplay, the player enters the pattern “b..ana”, indicating a desire to find words similar to “banana” but with unspecified letters at certain positions. The word dictionary efficiently identifies “banana” as a match, allowing the player to score points and progress.
Problem statement
You are tasked with creating a specialized data structure, WordDictionary, to manage a collection of words efficiently. This structure should enable the addition of new words and the subsequent matching of input strings against the stored words. Each word may contain letters and dots (.), where dots can represent any single letter.
Your task is to implement the WordDictionary class, which should provide the following functionalities:
WordDictionary(): This initializes a new instance of the data structure.void addWord(word): Adds the specified word to the collection within the data structure, enabling future matching against it.bool search(word): Determines whether any word within the data structure matches the input stringword. Matching involves comparing eachwordcharacter against the stored words, where dots in awordcan match any letter.
Example usage
word_dict = WordDictionary()word_dict.addWord("bad")word_dict.addWord("dad")word_dict.addWord("mad")print(word_dict.search("pad")) # Output: Falseprint(word_dict.search("bad")) # Output: Trueprint(word_dict.search(".ad")) # Output: Trueprint(word_dict.search("b..")) # Output: True
Knowledge test
Test your understanding of the concepts above by taking the quiz below!
Choose the correct answer.
What type of search algorithm is typically used to find matches in the word dictionary?
Linear search
Breadth-first search
Depth-first search
Algorithm
To solve this problem we will use the concept of a trie. A trie, also known as a prefix tree, is a tree-like data structure that stores a dynamic set of strings to allow for efficient prefix-based searching. Each node in the trie represents a single character of the string.
In the trie given above:
Each level represents a character in the word.
The path from the root to a node represents a word.
Nodes with
isEndOfWordset toTrueindicate the end of a valid word.
Note: To learn about tries, refer to the What is a prefix tree? Answer.
The WordDictionary class utilizes a trie structure. Each node of the trie consists of:
children: This is an array to store references to child nodes, representing all possible characters (‘a’ to ‘z’).isEndOfWord: This is a boolean flag indicating whether the node represents the end of a valid word.
Algorithm steps
Initialization:
Initialize the trie structure in the constructor of the
WordDictionaryclass.
Adding a word:
To add a word to the dictionary:
Iterate through each character of the word.
If the current character’s corresponding child node does not exist, create a new node and link it.
Set the
isEndOfWordflag for the last character to mark the end of a word.
Searching for a word:
To search for a word in the dictionary:
Perform a Breadth-first search (BFS) traversal through the trie.
Use a deque to store tuples of
(current node, index)pairs.For each character in the word:
If it’s a wildcard ‘
.’, explore all possible child nodes.If it’s a specific character, follow the corresponding child node.
If the search reaches the end of the word and the
isEndOfWordflag is set, returnTrue.If no valid word is found, return
False.
Coding example
Let’s implement our algorithm in Python!
from collections import dequeclass WordDictionary(object):def __init__(self):self.children = [None] * 26self.isEndOfWord = Falsedef addWord(self, word):curr = selffor char in word:c = ord(char) - ord('a')if curr.children[c] is None:curr.children[c] = WordDictionary()curr = curr.children[c]curr.isEndOfWord = Truedef search(self, word):curr = selfq = deque([(curr, 0)])wsize = len(word)while q:curr, idx = q.popleft()if idx == wsize:if curr.isEndOfWord:return Truecontinuec = word[idx]if c == '.':for child in curr.children:if child is not None:q.append((child, idx + 1))else:curr = curr.children[ord(c) - ord('a')]if curr is not None:q.append((curr, idx + 1))return False# Test caseobj = WordDictionary()obj.addWord("bad")obj.addWord("dad")obj.addWord("mad")print(obj.search("pad")) # Falseprint(obj.search("bad")) # Trueprint(obj.search(".ad")) # Trueprint(obj.search("b..")) # True
Code explanation
The step-by-step explanation is as follows:
Lines 4–6: We are defining the initializing function for the
WordDictionaryobject, where our class contains two attributes;self.childrenwhich is an array of size 26, initialized withNone, to represent child nodes for each character from ‘a’ to ‘z’ andself.isEndOfWordwhich is a boolean flag initialized asFalse, indicating whether the current node represents the end of a valid word.Line 8: We are defining a method
addWordthat takes a parameterwordto add that word to the dictionary.Lines 10–11: We iterate through each character (
char) in the input word and calculate the indexcin thechildrenarray based on the ASCII value of the character.Lines 12–14: We check if the child node for the current character doesn’t exist, and if it doesn’t, then we create a new
WordDictionarynode, link it, and move to the next child node.Line 15: We set the
isEndOfWordflag for the last character of the word to mark the end of a valid word.Lines 17: We define a method
searchwith a parameterwordto search for that word in the dictionary.Lines 18: We initialize a variable
currto reference the current instance of theWordDictionaryclass. This will be used to traverse the trie.Line 19: We initialize a deque
qwith a single tuple(curr, 0). This tuple contains the current node (curr) and the index0, indicating the starting point of the word.Line 20: We calculate the length of the input word and assign it to the variable
wsize. This will determine when the entire word has been processed during the search.Lines 21: We initiate a while loop that continues as long as the deque
qis not empty, i.e., as long as there are nodes to explore in the trie.Line 22: We dequeue a tuple
(curr, idx)from the left side of the dequeq.currrepresents the current node being processed andidxrepresents the index of the character in the word being examined.Lines 23–26: We check if the index
idxis equal to the length of the wordwsize. If it is, then we check if the current nodecurrrepresents the end of a valid word (curr.isEndOfWord). If it does, it returnsTrue. Otherwise, we continue to the next iteration of the while loop to process the next node.Lines 27–31: We retrieve the character at the index
idxfrom the input wordwordand assign it to the variablec. Next, we check if the charactercis a wildcard (‘.’). If it is, then we iterate through all children of the current nodecurr. For each non-null child, we enqueue a tuple(child, idx + 1)into the dequeq, indicating the child node and the following index in the word.Lines 32–36: After checking that
cis not a wildcard, we update the current nodecurrto be the child node corresponding to the charactercand enqueue a tuple(curr, idx + 1)into the dequeqto explore the next character in the word. Lastly, if no valid word is found, we returnFalse.
Time complexity
The time complexity for adding a word to the WordDictionary is
The time complexity for searching a word in the WordDictionary is
Space complexity
The space complexity for the WordDictionary class is
Free Resources