Bloom Filter
Learn about the Bloom filter, a type of probabilistic data structure.
We'll cover the following...
Probabilistic data structure
Probabilistic data structures are data structures that provide a reasonably approximate answer without giving a definitive, accurate result. They also offer a mechanism to approximate this estimation. The probabilistic data structure is used in big data and streaming applications, where approximate answers are sufficient to proceed with the workflow.
Some well-known probabilistic data structures include Bloom filter, Count Min Sketch, and HyperLogLog. Probabilistic data structures include these characteristics:
- They are space-efficient. The data structures fit into memory, providing a lower memory footprint. 
- Probabilistic data structures are parallelizable. 
- The access patterns on the data structure are predictable and in constant time. 
Probabilistic data structures can be used to:
- Determine whether an element belongs to a set. 
- Determine the total distinct elements in a given array. 
- Count the approximate frequency of a particular element from a large data set. 
Introduction
A Bloom filter is a space-efficient probabilistic data structure devised by Burton Howard Bloom in 1970. A Bloom filter is used to approximate the set membership problem statement and determine whether an element belongs to a set or not.
The data structure can produce a false positive, but it doesn't produce a ...