org.apache.cassandra.utils
Class BloomCalculations

java.lang.Object
  extended by org.apache.cassandra.utils.BloomCalculations

public class BloomCalculations
extends java.lang.Object

The following calculations are taken from: http://www.cs.wisc.edu/~cao/papers/summary-cache/node8.html "Bloom Filters - the math" This class's static methods are meant to facilitate the use of the Bloom Filter class by helping to choose correct values of 'bits per element' and 'number of hash functions, k'.


Nested Class Summary
static class BloomCalculations.BloomSpecification
          A wrapper class that holds two key parameters for a Bloom Filter: the number of hash functions used, and the number of buckets per element used.
 
Constructor Summary
BloomCalculations()
           
 
Method Summary
static int computeBestK(int bucketsPerElement)
          Given the number of buckets that can be used per element, return the optimal number of hash functions in order to minimize the false positive rate.
static BloomCalculations.BloomSpecification computeBucketsAndK(double maxFalsePosProb)
          Given a maximum tolerable false positive probability, compute a Bloom specification which will give less than the specified false positive rate, but minimize the number of buckets per element and the number of hash functions used.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

BloomCalculations

public BloomCalculations()
Method Detail

computeBestK

public static int computeBestK(int bucketsPerElement)
Given the number of buckets that can be used per element, return the optimal number of hash functions in order to minimize the false positive rate.

Parameters:
bucketsPerElement -
Returns:
The number of hash functions that minimize the false positive rate.

computeBucketsAndK

public static BloomCalculations.BloomSpecification computeBucketsAndK(double maxFalsePosProb)
Given a maximum tolerable false positive probability, compute a Bloom specification which will give less than the specified false positive rate, but minimize the number of buckets per element and the number of hash functions used. Because bandwidth (and therefore total bitvector size) is considered more expensive than computing power, preference is given to minimizing buckets per element rather than number of hash functions.

Parameters:
maxFalsePosProb - The maximum tolerable false positive rate.
Returns:
A Bloom Specification which would result in a false positive rate less than specified by the function call.


Copyright © 2009 The Apache Software Foundation