我有下面的循环,我正在用icc编译
for (int i = 0; i < arrays_size; ++i) {
total = total + C[i];
}矢量化报告说这个循环已经被矢量化了,但是我不明白这怎么可能,因为在写依赖之后有一个明显的读取。
报告输出如下:
LOOP BEGIN at loops.cpp(46,5)
remark #15388: vectorization support: reference C has aligned access [ loops.cpp(47,7) ]
remark #15305: vectorization support: vector length 4
remark #15399: vectorization support: unroll factor set to 8
remark #15309: vectorization support: normalized vectorization overhead 0.475
remark #15300: LOOP WAS VECTORIZED
remark #15448: unmasked aligned unit stride loads: 1
remark #15475: --- begin vector loop cost summary ---
remark #15476: scalar loop cost: 5
remark #15477: vector loop cost: 1.250
remark #15478: estimated potential speedup: 3.990
remark #15488: --- end vector loop cost summary ---
remark #25015: Estimate of max trip count of loop=31250
LOOP END有人能解释一下这意味着什么吗?怎么可能把这个循环矢量化呢?
发布于 2020-11-20 23:30:16
根据total和C[i]的类型,您可以利用相加和第一和4或8(或更多)次总计的结合性和交换性。
int subtotal[4] = {0,0,0,0};
for (int i = 0; i < arrays_size; i+=4) {
for(int k=0; k<4; ++k)
subtotal[k] += C[i+k];
}
// handle remaining elements of C, if necessary ...
// sum-up sub-totals:
total = (subtotal[0]+subtotal[2]) + (subtotal[1]+subtotal[3]);这适用于任何整数类型,但ICC默认假设浮点加法也是关联的(gcc和clang为此需要一些-ffast-math子集)。
https://stackoverflow.com/questions/64937979
复制相似问题