我如何用C语言SIMIDize下面的代码(当然是使用SIMD的内部函数)?我在理解SIMD内部函数时遇到了困难,这将会有很大帮助:
int sum_naive( int n, int *a )
{
int sum = 0;
for( int i = 0; i < n; i++ )
sum += a[i];
return sum;
}发布于 2012-08-09 14:23:51
这是一个相当简单的实现(警告:未测试的代码):
int32_t sum_array(const int32_t a[], const int n)
{
__m128i vsum = _mm_set1_epi32(0); // initialise vector of four partial 32 bit sums
int32_t sum;
int i;
for (i = 0; i < n; i += 4)
{
__m128i v = _mm_load_si128(&a[i]); // load vector of 4 x 32 bit values
vsum = _mm_add_epi32(vsum, v); // accumulate to 32 bit partial sum vector
}
// horizontal add of four 32 bit partial sums and return result
vsum = _mm_add_epi32(vsum, _mm_srli_si128(vsum, 8));
vsum = _mm_add_epi32(vsum, _mm_srli_si128(vsum, 4));
sum = _mm_cvtsi128_si32(vsum);
return sum;
}注意,输入数组a[]需要16字节对齐,并且n应该是4的倍数。
https://stackoverflow.com/questions/11872952
复制相似问题