文章/答案/技术大牛

发布

社区首页 >问答首页 >从SIMD向量中提取集字节位置

问从SIMD向量中提取集字节位置
EN

Stack Overflow用户

提问于 2015-02-13 18:43:16

回答 2查看 1.5K关注 0票数 6

我使用SIMD导入运行了一个计算平台。这些指令作为结果返回一个16字节的向量，名为compare，每个字节为0x00或0xff：

             0    1    2    3    4    5    6    7       15   16
compare : 0x00 0x00 0x00 0x00 0xff 0x00 0x00 0x00 ... 0xff 0x00

字节设置为0xff意味着我需要运行函数do_operation(i)，而i是字节的位置。

例如，上面的compare向量意味着，我需要运行以下操作序列：

do_operation(4);
do_operation(15);

到目前为止，我想出的最快解决方案如下：

for(...) {
        //
        // SIMD computations
        //
        __m128i compare = ... // Result of SIMD computations

        // Extract high and low quadwords for compare vector
        std::uint64_t cmp_low = (_mm_cvtsi128_si64(compare));
        std::uint64_t cmp_high = (_mm_extract_epi64(compare, 1));

        //  Process low quadword 
        if (cmp_low) {
            const std::uint64_t low_possible_positions = 0x0706050403020100;
            const std::uint64_t match_positions = _pext_u64(
                    low_possible_positions, cmp_low);
            const int match_count = _popcnt64(cmp_low) / 8;
            const std::uint8_t* match_pos_array =
                    reinterpret_cast<const std::uint8_t*>(&match_positions);

            for (int i = 0; i < match_count; ++i) {
                do_operation(i);
            }
        }

        // Process high quadword (similarly)
        if (cmp_high) { 

            const std::uint64_t high_possible_positions = 0x0f0e0d0c0b0a0908;
            const std::uint64_t match_positions = _pext_u64(
                    high_possible_positions, cmp_high);
            const int match_count = _popcnt64(cmp_high) / 8;
            const std::uint8_t* match_pos_array =
                    reinterpret_cast<const std::uint8_t*>(&match_positions);

            for(int i = 0; i < match_count; ++i) {
                do_operation(i);
            }
        }
}

首先提取128位向量(cmp_low和cmp_high)的第一和第二64位整数。然后使用popcount计算设置为0xff的字节数(位数为1除以8)。最后，我使用pext获得位置，没有零，如下所示：

0x0706050403020100
0x000000ff00ff0000
        |
      PEXT
        |
0x0000000000000402

我希望找到一个更快的解决方案，来提取在compare向量中设置为0xff的字节的位置()。更准确地说，在0xff向量中，通常只有0、1或2个字节设置为，我希望使用这些信息来避免某些分支。

c++

sse

simd

intrinsics

回答 2

Stack Overflow用户

发布于 2015-02-13 19:27:21

下面简要介绍如何减少测试数量：

首先，使用一个函数将128位整数的每个字节的所有lsb或msb投影到一个16位值(例如，在SSE2 cpus：pmovmskb上有一个X86汇编指令，这在英特尔和MS编译器中是_mm_movemask_pi8支持的；
然后将这个值分成4小段；
定义函数来处理每个可能的值(从0到15)，并将它们放在数组中；
最后，调用按每个nibble索引的函数(使用额外的参数来指示它在16位中是哪一个)。

票数 6

Stack Overflow用户

发布于 2017-01-31 13:19:31

由于在您的示例中，compare向量中通常只有0、1或2个字节设置为0 0xff，所以位掩码上的短时间循环可能比基于pext指令的解决方案更有效。在类似的问题上也可以参见我的answer。

/*
gcc -O3 -Wall -m64 -mavx2 -march=broadwell esbsimd.c
*/

#include <stdio.h>
#include <immintrin.h>

int do_operation(int i){           /* some arbitrary do_operation() */
   printf("i = %d\n",i);
   return 0;
}

int main(){

   __m128i compare = _mm_set_epi8(0xFF,0,0,0,  0,0,0,0, 0,0,0,0xFF, 0,0,0,0);   /* Take some randon value for compare */
   int           k = _mm_movemask_epi8(compare);

   while (k){
      int i=_tzcnt_u32(k);                                /* Count the number of trailing zero bits in k.  BMI1 instruction set, Haswell or newer. */
      do_operation(i);
      k=_blsr_u32(k);                                     /* Clear the lowest set bit in k.                                                        */
   }
   return 0;
}

/* 
Output:

i = 4
i = 15

*/

票数 5

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/28506500

复制

相似问题

问从SIMD向量中提取集字节位置
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问从SIMD向量中提取集字节位置EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问从SIMD向量中提取集字节位置
EN