首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >理解std::hardware_destructive_interference_size和std::hardware_constructive_interference_size

理解std::hardware_destructive_interference_size和std::hardware_constructive_interference_size
EN

Stack Overflow用户
提问于 2016-09-24 19:34:22
回答 2查看 10K关注 0票数 91

C++17添加了std::hardware_constructive_interference_size。首先,我认为这只是获取L1缓存行大小的一种可移植方式,但这是过度简化的。

问题:

  • 这些常量与L1缓存线大小有什么关系?
  • 有一个很好的例子来演示他们的用例吗?
  • 两者都被定义为static constexpr。如果您构建一个二进制文件并在其他具有不同缓存行大小的机器上执行它,这不是一个问题吗?当您不确定您的代码将在哪台机器上运行时,如何防止在这种情况下的错误共享?
EN

回答 2

Stack Overflow用户

发布于 2018-09-04 04:29:54

我几乎总是期望这些价值观是一样的。

关于上述问题,我想对所接受的答案作出一点小小的贡献。不久前,我看到了一个非常好的用例--在folly库中应该分别定义这两种情况。请参阅关于英特尔桑迪桥处理器的警告。

https://github.com/facebook/folly/blob/3af92dbe6849c4892a1fe1f9366306a2f5cbe6a0/folly/lang/Align.h

代码语言:javascript
复制
//  Memory locations within the same cache line are subject to destructive
//  interference, also known as false sharing, which is when concurrent
//  accesses to these different memory locations from different cores, where at
//  least one of the concurrent accesses is or involves a store operation,
//  induce contention and harm performance.
//
//  Microbenchmarks indicate that pairs of cache lines also see destructive
//  interference under heavy use of atomic operations, as observed for atomic
//  increment on Sandy Bridge.
//
//  We assume a cache line size of 64, so we use a cache line pair size of 128
//  to avoid destructive interference.
//
//  mimic: std::hardware_destructive_interference_size, C++17
constexpr std::size_t hardware_destructive_interference_size =
    kIsArchArm ? 64 : 128;
static_assert(hardware_destructive_interference_size >= max_align_v, "math?");

//  Memory locations within the same cache line are subject to constructive
//  interference, also known as true sharing, which is when accesses to some
//  memory locations induce all memory locations within the same cache line to
//  be cached, benefiting subsequent accesses to different memory locations
//  within the same cache line and heping performance.
//
//  mimic: std::hardware_constructive_interference_size, C++17
constexpr std::size_t hardware_constructive_interference_size = 64;
static_assert(hardware_constructive_interference_size >= max_align_v, "math?");
票数 25
EN

Stack Overflow用户

发布于 2020-02-01 00:00:05

我已经测试了上面的代码,但是我认为有一个小错误妨碍了我们理解底层的功能,为了防止错误共享,不应该在两个不同的atomics之间共享单个缓存行。我改变了那些结构的定义。

代码语言:javascript
复制
struct naive_int
{
    alignas ( sizeof ( int ) ) atomic < int >               value;
};

struct cache_int
{
    alignas ( hardware_constructive_interference_size ) atomic < int >  value;
};

struct bad_pair
{
    // two atomics sharing a single 64 bytes cache line 
    alignas ( hardware_constructive_interference_size ) atomic < int >  first;
    atomic < int >                              second;
};

struct good_pair
{
    // first cache line begins here
    alignas ( hardware_constructive_interference_size ) atomic < int >  
                                                first;
    // That one is still in the first cache line
    atomic < int >                              first_s; 
    // second cache line starts here
    alignas ( hardware_constructive_interference_size ) atomic < int >
                                                second;
    // That one is still in the second cache line
    atomic < int >                              second_s;
};

其结果是:

代码语言:javascript
复制
Hardware concurrency := 40
sizeof(naive_int)    := 4
alignof(naive_int)   := 4
sizeof(cache_int)    := 64
alignof(cache_int)   := 64
sizeof(bad_pair)     := 64
alignof(bad_pair)    := 64
sizeof(good_pair)    := 128
alignof(good_pair)   := 64
Running naive_int test.
Average time: 0.060303 seconds, useless result: 8212147
Running cache_int test.
Average time: 0.0109432 seconds, useless result: 8113799
Running bad_pair test.
Average time: 0.162636 seconds, useless result: 16289887
Running good_pair test.
Average time: 0.129472 seconds, useless result: 16420417

在最后的结果中,我经历了很大的变化,但从来没有将任何核心精确地用于那个特定的问题。总之,这使用了2 Xeon2690V2,从各种运行中使用了64或128个用于hardware_constructive_interference_size = 128,我发现64超过了En应当,128是可用缓存的非常糟糕的使用。

我突然意识到,你的问题帮助我理解杰夫·普雷辛在说什么,都是关于有效载荷的!?

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/39680206

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档