Probabilistic data structure

Probabilistic data structures are data structures that provide a reasonably approximate answer without giving a definitive, accurate result. They also offer a mechanism to approximate this estimation. The probabilistic data structure is used in big data and streaming applications, where approximate answers are sufficient to proceed with the workflow.

Some well-known probabilistic data structures include Bloom filter, Count Min Sketch, and HyperLogLog. Probabilistic data structures include these characteristics:

They are space-efficient. The data structures fit into memory, providing a lower memory footprint.
Probabilistic data structures are parallelizable.
The access patterns on the data structure are predictable and in constant time.

Probabilistic data structures can be used to:

Determine whether an element belongs to a set.
Determine the total distinct elements in a given array.
Count the approximate frequency of a particular element from a large data set.

Introduction

A Bloom filter is a space-efficient probabilistic data structure devised by Burton Howard Bloom in 1970. A Bloom filter is used to approximate the set membership problem statement and determine whether an element belongs to a set or not.

The data structure can produce a false positive, but it doesn't produce a ...

Introduction

Taxonomy of Databases

Database Architecture

Data Structures used in Databases

Disk Layout

Database Index

Transaction

Replication

Partitioning

Concurrency Controls

Consistency Models

Consensus

Common Problems Associated with Distributed Databases

Conclusion

Database Internals Assessment

Bloom Filter

Probabilistic data structure

Introduction