文章/答案/技术大牛

发布

问减少冲突的散列函数
EN

Stack Overflow用户

提问于 2018-10-12 05:14:20

回答 3查看 501关注 0票数 0

我正在使用这个散列函数，但是我得到了很多冲突。其目的是将元素的ascii值相加并输出值。有没有办法优化这个函数或其他函数，以减少冲突的数量？

int hash(char* s)
{
    int hash = 0;
    while(*s)
    {
        hash = hash + *s;
        s++;
    }
    return hash;
}

hash

hashtable

回答 3

Stack Overflow用户

发布于 2018-10-12 05:24:33

32位int的范围超过40亿。(如果您的int是64位的，则范围要大得多。)但是您的代码只是简单地将字符串中每个字符的值相加，它永远不会接近上限范围。所有的哈希码都将是较小的数字，挤占了可能值的低端，并增加了冲突的机会。

这就是为什么一个好的算法会比这个更复杂。

在谷歌快速搜索中出现的Here's one article。

票数 3

Stack Overflow用户

发布于 2018-10-12 05:58:51

"foo bar“和"bar foo”散列为相同的值，对吗？实现它的方式是使用ascii值及其在字符串中的位置来计算哈希，我天真地认为这将显着减少冲突。

int hash(char* s)
{
    int hash = 0;
    int pos = 0;
    while(*s)
    {
        pos++;
        hash += (*s * pos);
        s++;
    }
    return hash;
}

试试这个，看看是否有帮助。这个答案背后我没有太多的理论知识。

EDIT*如下所述，您可能希望hash是一个无符号整数。我在codechef.com上进行了测试，以下是源代码和结果：

#include <stdio.h>

unsigned int hash(char* s);
unsigned int hash2(char* s);

int main(void) {
    unsigned int temp1 = hash("foo bar");
    unsigned int temp2 = hash("bar foo");

    printf("temp1 is %d and temp2 is %d\n",temp1, temp2);

    temp1 = hash2("foo bar");
    temp2 = hash2("bar foo");

    printf("temp1 is %d and temp2 is %d\n",temp1, temp2);

    return 0;
}

unsigned int hash(char* s)
{
    unsigned int hash = 0;
    while(*s)
    {
        hash = hash + *s;
        s++;
    }
    return hash;
}

unsigned int hash2(char* s)
{
    unsigned int hash = 0;
    int pos = 0;
    while(*s)
    {
        pos++;
        hash += (*s * pos);
        s++;
    }
    return hash;
}

带输出：

temp1为665，temp2为665

temp1为2655，temp2为2715

票数 0

Stack Overflow用户

发布于 2018-10-12 07:09:40

是的，您的"hash“函数将对由相同字母组成的字符串产生冲突，例如"rail”和“rail”。这是因为你只使用了可交换的加法。

你可以使用像这样的东西，它包含一个素数作为因子。

unsigned long int hashBetter(const char* s)
{
    unsigned long int hash = 1234567890ul;
    while(*s)
    {
        hash = (*s + hash) * 4294967291ul;
        s++;
    }
    return hash;
}

或者使用CRC，它将输入数据广泛分布在可能的散列值的有效范围内：

unsigned long int hashGood(const char* s)
{
    unsigned long int hash = 1234567890ul;
    while(*s)
    {
        hash = crc(hash, *s);
        s++;
    }
    return hash;
}

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/52769024

复制

相似问题

问减少冲突的散列函数
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问减少冲突的散列函数EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问减少冲突的散列函数
EN