Collisions in Hash Tables
Explore the concept of collisions in hash tables, understand why they are unavoidable due to the pigeonhole principle, and learn how poor collision handling leads to data loss and performance degradation. This lesson equips you with foundational knowledge to handle collisions effectively in C# hash table implementations.
We'll cover the following...
A hash table computes an index from a key and stores the value at that position in the backing array. This approach fails when two different keys produce the same index. This situation is called a collision. This lesson explains its causes, its inevitability, and the importance of handling it correctly.
What is a collision?
A collision occurs when two different keys produce the same index after the hash function and modulo operation are applied.
For example, suppose your hash table has size 7. The key "listen" produces a hash value that maps to index 4. The key "silent" also produces a hash value that maps to index 4. Both keys are different, but they compete for the same slot. This is a collision.
Notice that "listen" and "silent" are anagrams of each other. If your hash function simply sums the character codes, both strings will always produce the same sum and therefore the same index. This is an example of a weakness in a naive hash function, but even sophisticated hash functions cannot eliminate collisions entirely.
Why collisions are unavoidable
Collisions are a mathematical certainty in any hash table, ...