How to use Bloom filter in the Guava library
Overview
Bloom filter is a highly space-efficient probabilistic data structure designed to check the set membership.
The Guava project is a collection of several of Google’s core libraries for string processing, caching, etc.
Here, we use the Bloom filter implementation provided in the Guava library.
Guava dependency
Guava can be included in the Java Maven or Gradle projects. Include the following in the pom.xml of the Maven project.
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>30.1.1-jre</version>
</dependency>
Refer to Maven Repository for different versions of
Guava.
Funnel class in Guava
A Funnel describes how to decompose a particular object type into primitive field values. This is needed because Bloom filters operate on bytes.
You can create a funnel for custom objects as follows.
Consider a Student class like the one below.
class Student{
final String name;
final int id;
Student(String name, int id) {
this.name = name;
this.id = id;
}
}
We can define a funnel for the Student class as follows.
Funnel<Student> studentFunnel = new Funnel<Student>() {
@Override
public void funnel(Student from, PrimitiveSink into) {
into.putString(from.name, Charsets.UTF_8)
.putInt(from.id);
}
};
In the code above, we define a PrimitiveSink which can receive a stream of primitive values.
Create a Bloom filter in Guava
In order to define a Bloom filter in Guava, we need the following.
- A funnel object that can convert an object to bytes.
- Size of the Bloom filter.
- Expected false positive probability.
We use the static create() method of the BloomFilter class.
The method signature is as follows.
public static <T> BloomFilter<T> create(Funnel<? super T> funnel, int expectedInsertions, double fpp)
Parameters
Funnel funnel- funnel object.int expectedInsertions- the number of expected insertions to the constructed Bloom filter.double fpp– the desired false positive probability. The value must be positive and less than1.0.
Insert elements to bloom filter
We use the .put() method to add the elements to the Bloom filter.
Check for the existence of an element
We use the .mightContain() method to check if an element is already inserted into the Bloom filter. The method returns true if the element might have been put in this Bloom filter, and it returns false if this is definitely not the case.
Example
In the following code, we define a Bloom filter with a string funnel, with 100 as the capacity of the Bloom filter and 0.01 as the expected false positive probability.
Next, certain elements are inserted into the filter.
Then, we check for the membership of the inserted elements as well as for new elements.
import com.google.common.base.Charsets;
import com.google.common.hash.BloomFilter;
import com.google.common.hash.Funnels;
public class BFilter {
public static void main(String[] args) {
BloomFilter<String> bloomFilter = BloomFilter.create(Funnels.stringFunnel(Charsets.UTF_16), 100, 0.01);
bloomFilter.put("hello");
bloomFilter.put("hi");
String testString = "india";
boolean checkStatus = bloomFilter.mightContain(testString);
System.out.println(testString + (checkStatus?" seen before":" not seen before"));
testString = "hello";
checkStatus = bloomFilter.mightContain(testString);
System.out.println(testString + (checkStatus?" seen before":" not seen before"));
}
}