文章/答案/技术大牛

发布

社区首页 >问答首页 >Bloom Filter C实现

问Bloom Filter C实现
EN

Code Review用户

提问于 2022-07-27 20:26:35

回答 1查看 76关注 0票数 3

这是我的第一个C程序。它从一个由布卢姆滤波器 char数组和k哈希函数组成的数组创建一个N。

在这一点上，我并不特别关心最大化的性能，因为我只是从C开始。任何关于使我的代码更易懂的建议都会受到高度赞赏。

// Bloomfilter.c
#include 

// Function headers
int modularHash(char string[], int R, int M);
int sh1(char string[]);
int sh2(char string[]);
int sh3(char string[]);

int main()
{
    enum
    {
        N = 3,  // the number of strings
        M = 10, // the size of the bloom filter
        k = 3,  // the number of hash functions
    };

    char *strings[N] = {"abc", "777", "xyz"};

    int bloom_filter[M] = {0}; // array of `n` 0s.

    // array of pointers to hash functions.
    int (*hashers[k])() = {sh1, sh2, sh3};

    // Fill the bloom filter
    for (int j = 0; j < N; j++)
        for (int i = 0; i < k; i++)
        {
            {
                // Compute the hash of string j using the k'th hash function
                // Map the hash to a position in the bloom filter
                // then add this position to the bloom filter.
                int result = hashers[i](strings[j]);
                int position = result % M;
                bloom_filter[position] = 1;
            }
        }

    // Print the bloom filter
    for (int i = 0; i < M; i++)
        printf("%d ", bloom_filter[i]);


    // TODO implement a function for checking a string against the bloom filter.
    return 0;
}

/*
 Compute the modular hash of a char array
    @param string, array of characters to hash
    @param R: a seed value, should be a prime
    @param M: a seed value, should also be a prime
*/
int modularHash(char string[], int R, int M)
{
    int chars = 3; // the number of characters
    int hash = 0;

    char s;
    int n;

    for (int i = 0; i < chars - 1; i++)
        s = string[i];
    n = (int)s;
    hash = (R * hash + n) % M;
    return hash;
}

int sh1(char string[])
{
    int M = 3;
    int R = 31;
    return modularHash(string, M, R);
}

int sh2(char string[])
{
    int M = 47;
    int R = 17;
    return modularHash(string, M, R);
}

int sh3(char string[])
{
    int M = 97;
    int R = 29;
    return modularHash(string, M, R);
}

```#qcStackCode#

algorithm

bloom-filter

回答 1

Code Review用户

回答已采纳

发布于 2022-07-30 12:33:51

Questionable代码

为什么要多次分配s？

for (int i = 0; i < chars - 1; i++)
    s = string[i];

以上均为

s = string[chars - 1 - 1];

而不是在循环(R * hash + n) % M之后执行一个D4操作，而是在循环中进行这个操作。

% 是不是国防部

modularHash()返回(-M ... M)范围中的值，而不是[0 ... M)，因为char可能被签名。

考虑result < 0时引起的UB。

int result = hashers[i](strings[j]);
int position = result % M;
bloom_filter[position] = 1;  // UB

相反，使用未签名的数学。

unsigned modularHash(char string[], unsigned R, unsigned M) {
  unsigned char *ustring = (unsigned char *) string; 
  ...
  unsigned hash = 0;
  ...
    s = ustring[i];

Avoid签名溢出

R * hash + n，使用int数学，可能会溢出导致未定义行为(UB)的大型R。在给定适度常量的情况下，此代码不适用。

使用 const

在不更改引用数据的函数中，使用const。这允许更广泛的功能应用程序，自文档代码的使用，并允许选择优化，否则不允许编译器看到。

// int modularHash(char string[], int R, int M)
int modularHash(const char string[], int R, int M)

Avoid硬编码限制

考虑迭代到字符串的末尾，而不是3(一个未描述的魔术数字)。

unsigned modularHash(const char string[], unsigned char R, unsigned char M) {
  const unsigned char *ustring = (const unsigned char *) string; 
  unsigned hash = 0;

  while (*ustring) {
    hash = (R*hash + *ustring) % M;
  }
  return hash;
}

如果希望限制为3，请使用更多的描述性名称，并考虑字符串可能小于3。

  int limit = 3;
  for (int i = 0; i < limit && *ustring; i++) {
    hash = (R*hash + *ustring) % M;
  }

定义两次的Avoid

下面定义字符串的数量两次，一次用3，一次用初始化器。

    N = 3,  // the number of strings
char *strings[N] = {"abc", "777", "xyz"};

请考虑下面的情况，其中字符串大小仅由初始化器的数量定义。

char *strings[] = {"abc", "777", "xyz"};
size_t N = sizeof strings / sizeof strings[0];

hashers[]也是如此。

<#>Unclear代码

为什么-1在for (int i = 0; i < chars - 1; i++)？

票数 2

页面原文内容由Code Review提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://codereview.stackexchange.com/questions/278392

复制

相似问题

问Bloom Filter C实现
EN

回答 1

Code Review用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Bloom Filter C实现EN

回答 1

Code Review用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Bloom Filter C实现
EN