BLOOMFILTER 2
NAME
Bloomfilter - Bloom filters
SYNOPSIS
.EX
include "sets.m";
include "bloomfilter.m";
bloomfilter := load Bloomfilter Bloomfilter->PATH;
init: fn();
filter: fn(d: array of byte, logm, k: int): Sets->Set;
DESCRIPTION
A Bloom filter is a method of representing a set to support probabilistic
membership queries. It uses independent hash functions of members
of the set to set elements of a bit-vector.
Init should be called first to initialise the module.
Filter returns a Set
s representing the Bloom filter for the single-member
set
{ d }. K independent hash functions are used, each of range
"[0, 2^" logm ), to return a Bloom filter
2^ logm bits wide. It is an error if
logm is less than 3 or greater than 30.
Bloom filters can be combined by set union. The set represented by Bloom filter a is not a subset of another b if there are any members in a that are not in b . Together, logm , k , and n (the number of members in the set) determine the "false positve" rate (the probability that a membership test will not eliminate a member that is not in fact in the set). The probability of a "false positive" is approximately (1-e^(-kn/(2^logm))^k. For a given false positive rate, f , a useful formula to determine appropriate parameters is: k=ceil(-log₂(f)), and logm=ceil(log₂(nk)).
EXAMPLES
Create a 128 bit-wide bloom filter
f representing all the elements
in the string array
elems , with
k =6. .EX
A, B, None: import Sets;
for(i:=0; i<len elems; i++)
f = f.X(A|B, filter(array of byte elems[i], 7, 6));
Test whether the string
s is a member of
f . If there were 12 elements in
elems , the probability of a false positive would be
approximately 0.0063.
.EX
if(filter(array of byte s, 7, 6).X(A&~B, f).eq(None))
sys->print("'%s' might be a member of f\en", s);
SOURCE
/appl/lib/bloomfilter.b SEE ALSO
sets (2)