文章/答案/技术大牛

发布

社区首页 >问答首页 >Murmurhash2无符号Int溢出

问Murmurhash2无符号Int溢出
EN

Stack Overflow用户

提问于 2017-08-30 14:43:04

回答 2查看 649关注 0票数 1

我目前正在尝试实现一个hashtable/trie，但是当我将参数传递给murmurhash2时，它会返回一个数字，但我会得到无符号int溢出的运行时错误：

test.c:53:12:运行时错误:无符号整数溢出: 24930 * 1540483477不能表示为“无符号int”类型

test.c:60:4:运行时错误:无符号整数溢出: 2950274797 * 1540483477不能用'unsigned int‘6265表示

我在53和60线上放了一堆星星(*)

我不确定我是不是把一些参数传递错了。任何帮助都将不胜感激！

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

unsigned int MurmurHash2 ( const void * key, int len, unsigned int seed );

int main(void)
{
   const char* s= "aa";
   unsigned int number= MurmurHash2 (s, (int)strlen(s), 1) % 10000;
   printf("%u\n", number);
}

unsigned int MurmurHash2 ( const void * key, int len, unsigned int seed )
{
// 'm' and 'r' are mixing constants generated offline.
// They're not really 'magic', they just happen to work well.

const unsigned int m = 0x5bd1e995;
const int r = 24;

// Initialize the hash to a 'random' value

unsigned int h = seed ^ len;

// Mix 4 bytes at a time into the hash

const unsigned char * data = (const unsigned char *)key;

while(len >= 4)
{
    unsigned int k = *(unsigned int *)data;

    k *= m;
    k ^= k >> r;
    k *= m;

    h *= m;
    h ^= k;

    data += 4;
    len -= 4;
}

// Handle the last few bytes of the input array

switch(len)
{
case 3: h ^= data[2] << 16;
case 2: h ^= data[1] << 8;
case 1: h ^= data[0];
        h *= m; ************************************************
};

// Do a few final mixes of the hash to ensure the last few
// bytes are well-incorporated.

h ^= h >> 13;
h *= m;   **************************************
h ^= h >> 15;

return h;
}

murmurhash

回答 2

Stack Overflow用户

回答已采纳

发布于 2017-08-30 15:29:49

似乎您正在使用UBSan选项-fsanitize=unsigned-integer-overflow或其他一些选项(如-fsanitize=integer )构建该选项，以支持此检查。文献资料说：

注意，与带符号整数溢出不同，无符号整数不是未定义的行为。然而，尽管它具有定义良好的语义，但它通常是无意的，因此UBSan提供了捕捉它的机会。

对于MurmurHash，乘法中的无符号整数溢出是完全有意的，因此您应该禁用该选项。

如果您显式地使用-fsanitize=unsigned-integer-overflow，请删除它。
如果另一个选项启用了它，则传递-fno-sanitize=unsigned-integer-overflow。
或者，用__attribute__((no_sanitize("unsigned-integer-overflow")))对函数__attribute__((no_sanitize("unsigned-integer-overflow")))进行注释。

另一个注意事项:您的代码似乎是从假定为32位MurmurHash2的32位参考实现的int中复制的。您应该考虑使用uint32_t。

票数 4

Stack Overflow用户

发布于 2017-08-30 15:02:26

unsigned int有一个与系统相关的位数。

在大多数系统上，这个数字是32位(4字节)，但是有些系统可能使用不同的大小(即在某些机器上使用64位(8字节))。

然而，杂凑词是一个特定的大小。64位变体需要64位无符号类型，32位变体需要32位无符号类型。

这种不一致性可以通过使用在uint64_t中定义的uint64_t或uint32_t类型来解决。

我要补充的是，后缀UL (无符号长)可能应该添加到您使用的任何数字常量中。即2950274797UL * 1540483477UL.

正如@nwellnhof所指出的，您的代码似乎使用了算法的32位变体。

在这些情况下，乘法指令中的溢出是正常的(结果大于可用位数并被截断)。这种数据丢失作为散列过程的一部分是可以接受的。

考虑使用以下方法将预期结果通知编译器：

 h = (uint32_t)(((uint64_t)h * m) & 0xFFFFFFFF)

祝好运!

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/45963412

复制

相似问题

问Murmurhash2无符号Int溢出
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Murmurhash2无符号Int溢出EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Murmurhash2无符号Int溢出
EN