Solution: Similar String Groups

Statement▼

Two strings x and y are considered similar if they are either exactly the same or can be made identical by swapping at most two different characters in string x.

We define a similarity group as a set of strings where each string is similar to at least one other string in the group. A string doesn't need to be directly similar to every other string in the group — it just needs to be connected to them through a chain of similarities.

Given a list of strings strs, where each string is an anagram of the others, your task is to determine how many such similarity groups exist in the list.

Constraints:

$1 \leq$ strs.length $\leq 300$
$1 \leq$ strs[i].length $\leq 300$
strs[i] consists of lowercase letters only.
All strings in strs have the same length and are anagrams of each other.

This problem can be seen as a graph connectivity challenge. Each string is a node, and an edge exists between two nodes if their corresponding strings are similar. Our goal is to count how many connected groups (components) exist in this graph.

We solve this problem using the Union-Find (Disjoint Set Union) data structure to efficiently group similar strings. Initially, each string is placed in its own group. We then iterate over all possible pairs of strings. For each pair at indexes i and j, we check whether the two strings are similar — that is, either exactly the same or differ at exactly two positions (meaning one swap can make them equal). If they are similar and currently belong to different groups (i.e., their roots in the Union-Find structure are different), we perform a union operation to merge their groups. Repeating this across all string pairs gradually reduces the number of distinct groups. Finally, we count the number of unique roots in the Union-Find structure, which represents the number of similar string groups.

Here’s the step-by-step explanation of the solution:

Initialize n = len(strs).
Create a Union-Find (DSU) structure with n elements, where each element is its own parent.
Define a function areSimilar(s1, s2) that returns TRUE if both strings s1 and s2 are similar according to the given condition:
1. Initialize an empty list diff = [] to track differences.
2. Loop through both strings in parallel using zip.
  1. If characters differ at any position, record the mismatch in diff.
  2. Early exit if more than 2 differences and return FALSE.
3. After the loop is completed, evaluate the result:
  1. len(diff) == 0 means the strings are identical.
  2. len(diff) == 2 and diff[0] == diff[1][::-1] means there are exactly two differences and the character pairs are mirror images of each other.
Loop over all pairs (i, j) such that 0 ≤ i < j < n.
For each pair, use the areSimilar function to check if strs[i] and strs[j] are similar.
If similar, use find(i) and find(j) to get their root parents.
If the roots differ, merge them using union(i, j).
After processing all pairs, iterate over all indexes i from 0 to n - 1 and find their root parents using find(i).
Add each root to a set to track unique groups.
Return the size of the set as the number of similarity groups.

Let’s look at the following illustration to get a better understanding of the solution:

Problem

Submissions

Debug