A Bloom Filter is a Probabilistic Data Structure. The Filter responds to exits(str) with 100% certainty if str is not present. A true response from exists(str) only indicates a probability of str being present, and could be a false positive.

Bloom filters are used to efficiently and accurately check the absence of a value. Typical examples can be blacklist of domains, checks for a username availability for registration, or whether to look for data in cache depending on the key, etc.

Since Bloom Filters use arrays of bits, they are space efficient, and the get/set operations do not require any scans or searches, making them compute efficient as well.

Having used them at IGN to quickly respond to is the username available on the registration form, I recently ran into them again for a hobby project.

import java.util.BitSet;

/**
 *
 * Below is a _stupidly_ simple example of a bloom filter. The bloom filters can also be chained,
 * as you can imagine, too many values and/or too small of a bitset can cause
 * collisions.
 */
public class BloomFilter {


    /**
     * The java bitset is just like a List or Set or Map, it is _not_ fixed size.
     * We give it a hint here so that the .size() operation will return this result.
     */
    BitSet filter = new BitSet(1024);

    public static void main(String[] args) {
        BloomFilter filter = new BloomFilter();
        //add some values to the filter
        filter.add("Manish");
        filter.add("Nutella");
        filter.add("Dublin");
        //test the sample values against this filter
        String[] test = new String[]{"Manish","Pandit","pandit", "Dublin", "San Francisco", "Pizza"};
        for(String str: test){
            System.out.println("Does " + str + " exist? " + filter.exists(str));
        }
        // In the real world not only the hash functions will be smarter, more in number,
        // and there'd be a chain of these filters to reduce false positives.
        // However, there are *absolutely* no false negatives, which is the whole point of bloom filters.
    }

    public void add(String string){
        filter.set(hash1(string));
        filter.set(hash2(string));
    }

    public boolean exists(String string){
        return filter.get(hash1(string)) && filter.get(hash2(string));

    }

    private int hash1(String string){
        int sum = 0;
        for(int i=string.length()-1;i>=0;i--){
            sum += string.charAt(i) * Math.pow(10,i);
        }
        return sum%filter.size();
    }

    private int hash2(String string){
        return hash1(string)/2;
    }
}

The output of the above code is :

Does Manish exist? true
Does Pandit exist? false
Does pandit exist? false
Does Dublin exist? true
Does San Francisco exist? false
Does Pizza exist? false

Comments