C++17添加了std::hardware_constructive_interference_size。首先,我认为这只是获取L1缓存行大小的一种可移植方式,但这是过度简化的。
问题:
static constexpr。如果您构建一个二进制文件并在其他具有不同缓存行大小的机器上执行它,这不是一个问题吗?当您不确定您的代码将在哪台机器上运行时,如何防止在这种情况下的错误共享?发布于 2018-09-04 04:29:54
我几乎总是期望这些价值观是一样的。
关于上述问题,我想对所接受的答案作出一点小小的贡献。不久前,我看到了一个非常好的用例--在folly库中应该分别定义这两种情况。请参阅关于英特尔桑迪桥处理器的警告。
https://github.com/facebook/folly/blob/3af92dbe6849c4892a1fe1f9366306a2f5cbe6a0/folly/lang/Align.h
// Memory locations within the same cache line are subject to destructive
// interference, also known as false sharing, which is when concurrent
// accesses to these different memory locations from different cores, where at
// least one of the concurrent accesses is or involves a store operation,
// induce contention and harm performance.
//
// Microbenchmarks indicate that pairs of cache lines also see destructive
// interference under heavy use of atomic operations, as observed for atomic
// increment on Sandy Bridge.
//
// We assume a cache line size of 64, so we use a cache line pair size of 128
// to avoid destructive interference.
//
// mimic: std::hardware_destructive_interference_size, C++17
constexpr std::size_t hardware_destructive_interference_size =
kIsArchArm ? 64 : 128;
static_assert(hardware_destructive_interference_size >= max_align_v, "math?");
// Memory locations within the same cache line are subject to constructive
// interference, also known as true sharing, which is when accesses to some
// memory locations induce all memory locations within the same cache line to
// be cached, benefiting subsequent accesses to different memory locations
// within the same cache line and heping performance.
//
// mimic: std::hardware_constructive_interference_size, C++17
constexpr std::size_t hardware_constructive_interference_size = 64;
static_assert(hardware_constructive_interference_size >= max_align_v, "math?");发布于 2020-02-01 00:00:05
我已经测试了上面的代码,但是我认为有一个小错误妨碍了我们理解底层的功能,为了防止错误共享,不应该在两个不同的atomics之间共享单个缓存行。我改变了那些结构的定义。
struct naive_int
{
alignas ( sizeof ( int ) ) atomic < int > value;
};
struct cache_int
{
alignas ( hardware_constructive_interference_size ) atomic < int > value;
};
struct bad_pair
{
// two atomics sharing a single 64 bytes cache line
alignas ( hardware_constructive_interference_size ) atomic < int > first;
atomic < int > second;
};
struct good_pair
{
// first cache line begins here
alignas ( hardware_constructive_interference_size ) atomic < int >
first;
// That one is still in the first cache line
atomic < int > first_s;
// second cache line starts here
alignas ( hardware_constructive_interference_size ) atomic < int >
second;
// That one is still in the second cache line
atomic < int > second_s;
};其结果是:
Hardware concurrency := 40
sizeof(naive_int) := 4
alignof(naive_int) := 4
sizeof(cache_int) := 64
alignof(cache_int) := 64
sizeof(bad_pair) := 64
alignof(bad_pair) := 64
sizeof(good_pair) := 128
alignof(good_pair) := 64
Running naive_int test.
Average time: 0.060303 seconds, useless result: 8212147
Running cache_int test.
Average time: 0.0109432 seconds, useless result: 8113799
Running bad_pair test.
Average time: 0.162636 seconds, useless result: 16289887
Running good_pair test.
Average time: 0.129472 seconds, useless result: 16420417在最后的结果中,我经历了很大的变化,但从来没有将任何核心精确地用于那个特定的问题。总之,这使用了2 Xeon2690V2,从各种运行中使用了64或128个用于hardware_constructive_interference_size = 128,我发现64超过了En应当,128是可用缓存的非常糟糕的使用。
我突然意识到,你的问题帮助我理解杰夫·普雷辛在说什么,都是关于有效载荷的!?
https://stackoverflow.com/questions/39680206
复制相似问题