What is fuzzy search?

Overview

In the search systems like search engines, users enter a search query, and the search engine returns some results based on the user's query. The two approaches for finding the web pages are the following:

  • Exact matching
  • Approximate matching

Exact matching: Find and return the webpages where the search query exactly matches the content on the webpage.

An example of exact match based search

Approximate matching: Find and return the webpages where the search query approximately matches the content on the webpage.

An example of approximate match based search

A fuzzy search returns results that are based on an approximate match of the search term with the data we're looking for. It's used in search systems to make searching flexible, and it also acts as a spelling corrector.

How does fuzzy search work?

In fuzzy search, the actual query is modified using different operations like insertion, deletion, substitution, and transposition, as shown in the following illustration:

Query modification operations
  • Insertion: It adds more characters to the actual query to find the possible string matches. For example, a possible string-match "news" is found by adding 's' at the end of the query "new". The word "news" is different in meaning from "new" but there is a possibility that the user was actually searching for news. In response to the query "new", the pages that a search engine returns could be about "news" also.
  • Deletion: It removes some of the characters from the actual query. For example, "Husstle" is changed to a correctly spelled word "Hustle" by deleting an 's' from the query.
  • Substitution: This operation replaces some characters in the actual query. For example, "Skiil" is changed to the correctly spelled word "skill" by replacing 'i' with 'l'.
  • Transposition: This operation modifies the actual query by changing the position of the characters. For example, "Strnig" is converted to an actual word "String" by changing the position of 'i' and 'n'.

Above, we saw one example for each operation, but there would be multiple options (strings) that would pop up by altering a single query. We have to find the closest string. The closeness of a query with any other string is measured by the number of operations performed on the search query to convert it into that specific string. Let's look into some examples shown in the below table:

Edit the distance of possible match from the actual query

If a user enters a random query like "jsavsbnavg", the fuzzy search will first find the possible words that can be made from this query by applying the above operations. Two of the words that are made through deletion operations are "java" and "svg". To convert "jsavsbnavg" into "java", six delete operations are performed.

Different algorithms are used by fuzzy search systems to find the distance of the query from other matching strings. Some of these algorithms are discussed here.

Advantages of fuzzy search

  • The exact search (character by character matching) leads to unwanted results or no results if there are spelling mistakes. In contrast, the fuzzy search helps us correct spelling mistakes and find the relevant results.
  • Besides correcting spelling mistakes and finding the relevant results, fuzzy search is also helpful in autocomplete.
  • Fuzzy search is also used to find the results containing the synonymous words to the query.
  • A fuzzy search helps give suggestions to the users based on the search query. It may produce less relevant search results in certain circumstances, but it may also produce highly relevant search results in others that an exact search algorithm would have screened out.
Copyright ©2024 Educative, Inc. All rights reserved