对于我正在研究的一个现实世界的问题,布卢姆过滤器似乎很有希望。一个流行的java实现似乎是谷歌的番石榴库。
我需要使用布卢姆过滤器的联合操作,一个交叉操作将是好的,但我可以绕过它。
在java文档中,我找不到执行交集的方法,而putAll方法似乎像联合操作一样工作。
所以我的问题是:
如果有人可以推荐另一个流行的、经过良好测试的库,它可以在maven的存储库上使用,并且具有交集和联合,这也解决了我的需求。
发布于 2021-10-20 12:10:54
是的,JavaDoc说:
通过执行基础数据的位或值,将此Bloom筛选器与另一个Bloom筛选器组合。突变发生在这种情况下。呼叫者必须确保布卢姆过滤器有适当的大小,以避免它们饱和。
问题是它会抛出IllegalArgumentException - if isCompatible(that) == false,因此如果要使用此方法,必须确保isCompatible返回true:
确定给定的Bloom筛选器是否与此Bloom筛选器兼容。要使两个Bloom过滤器兼容,它们必须:
尽管如此,让我们尝试回答第二个问题,并找到一个更容易的解决方案。
是的,您可以使用BloomFilter实现Predicate并使用其.and()和.or()方法的事实,记住mightContain行为取决于布卢姆过滤器的参数。
示例:
// bloomFilter1: [0, 100], expected size 200, FPP 1%
BloomFilter<Integer> bloomFilter1 = IntStream.rangeClosed(0, 100)
.boxed()
.collect(toBloomFilter(Funnels.integerFunnel(), 200L, 0.01d));
// bloomFilter2: [80, 200], expected size 200, FPP 1%
BloomFilter<Integer> bloomFilter2 = IntStream.rangeClosed(80, 200)
.boxed()
.collect(toBloomFilter(Funnels.integerFunnel(), 200L, 0.01d));
// bloomFilter2: [80, 200], expected size 1000, FPP 0.1%
BloomFilter<Integer> bloomFilter3 = IntStream.rangeClosed(80, 200)
.boxed()
.collect(toBloomFilter(Funnels.integerFunnel(), 1000L, 0.001d));
assertThat(bloomFilter1.isCompatible(bloomFilter2)).isTrue(); // same parameters
assertThat(bloomFilter1.isCompatible(bloomFilter3)).isFalse(); // different parameters
BloomFilter<Integer> bloomFilterUnion = bloomFilter1.copy(); // preserves BF parameters
bloomFilterUnion.putAll(bloomFilter2);
// note that bloomFilterUnion.putAll(bloomFilter3) would throw IAE
// using AssertJ Predicate asserts https://joel-costigliola.github.io/assertj/core-8/api/org/assertj/core/api/PredicateAssert.html
// or == union == putAll
assertThat(bloomFilterUnion).accepts(90);
assertThat(bloomFilter1.or(bloomFilter2)).accepts(90);
assertThat(bloomFilter1.or(bloomFilter3)).accepts(90);
assertThat(bloomFilterUnion).accepts(42);
assertThat(bloomFilter1.or(bloomFilter2)).accepts(42);
assertThat(bloomFilter1.or(bloomFilter3)).accepts(42);
assertThat(bloomFilterUnion).accepts(180);
assertThat(bloomFilter1.or(bloomFilter2)).accepts(180);
assertThat(bloomFilter1.or(bloomFilter3)).accepts(180);
assertThat(bloomFilterUnion).rejects(300);
assertThat(bloomFilter1.or(bloomFilter2)).rejects(300);
assertThat(bloomFilter1.or(bloomFilter3)).rejects(300);
// and == intersection
assertThat(bloomFilter1.and(bloomFilter2)).accepts(90);
assertThat(bloomFilter1.and(bloomFilter3)).accepts(90);
assertThat(bloomFilter1.and(bloomFilter2)).rejects(42);
assertThat(bloomFilter1.and(bloomFilter3)).rejects(42);
assertThat(bloomFilter1.and(bloomFilter2)).rejects(180);
assertThat(bloomFilter1.and(bloomFilter3)).rejects(180);
assertThat(bloomFilter1.and(bloomFilter2)).rejects(300);
assertThat(bloomFilter1.and(bloomFilter3)).rejects(300);我个人不想搞乱任何阶级的内部,但在这里,我没有必要这样做。
https://stackoverflow.com/questions/69644359
复制相似问题