首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何理解icc编译器在优化报告中的加速?

如何理解icc编译器在优化报告中的加速?
EN

Stack Overflow用户
提问于 2018-11-08 08:33:13
回答 1查看 686关注 0票数 2

环境是:

icc版本19.0.0.117 (gcc版本5.4.0兼容性)

英特尔并行工作室XE集群版2019

Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz

Ubuntu 16.04

编译器标志是:

-std=gnu11 -Wall -xHost -xCORE-AVX2 2 -O2 -fma -qopenmp -qopenmp-simd -qopt-report=5

我使用OpenMP simd或英特尔parama将我的循环矢量化以获得加速。在icc生成的优化报告中,我通常看到以下结果:

代码语言:javascript
复制
LOOP BEGIN at get_forces.c(3668,3)
   remark #15389: vectorization support: reference mon->fricforce[n1][d] has unaligned access   [ get_forces.c(3669,4) ]
   remark #15389: vectorization support: reference mon->vel[n1][d] has unaligned access   [ get_forces.c(3669,36) ]
   remark #15389: vectorization support: reference vel[n1][d] has unaligned access   [ get_forces.c(3669,51) ]
   remark #15389: vectorization support: reference mon->drag[n1][d] has unaligned access   [ get_forces.c(3671,4) ]
   remark #15389: vectorization support: reference mon->vel[n1][d] has unaligned access   [ get_forces.c(3671,40) ]
   remark #15389: vectorization support: reference vel[n1][d] has unaligned access   [ get_forces.c(3671,57) ]
   remark #15381: vectorization support: unaligned access used inside loop body
   remark #15305: vectorization support: vector length 2
   remark #15309: vectorization support: normalized vectorization overhead 0.773
   remark #15300: LOOP WAS VECTORIZED
   remark #15450: unmasked unaligned unit stride loads: 3 
   remark #15451: unmasked unaligned unit stride stores: 2 
   remark #15475: --- begin vector cost summary ---
   remark #15476: scalar cost: 21 
   remark #15477: vector cost: 11.000 
   remark #15478: estimated potential speedup: 1.050 
   remark #15488: --- end vector cost summary ---
   remark #25456: Number of Array Refs Scalar Replaced In Loop: 1
   remark #25015: Estimate of max trip count of loop=1
LOOP END

我的问题是:我不明白如何从

代码语言:javascript
复制
normalized vectorization overhead 0.773
scalar cost: 21 
vector cost: 11.000 

另一个更极端和困惑的情况可能是

代码语言:javascript
复制
LOOP BEGIN at get_forces.c(2690,8)
<Distributed chunk3>
   remark #15388: vectorization support: reference q12[j] has aligned access   [ get_forces.c(2694,19) ]
   remark #15388: vectorization support: reference q12[j] has aligned access   [ get_forces.c(2694,26) ]
   remark #15335: loop was not vectorized: vectorization possible but seems inefficient. Use vector always directive or -vec-threshold0 to override 
   remark #15305: vectorization support: vector length 2
   remark #15309: vectorization support: normalized vectorization overhead 1.857
   remark #15448: unmasked aligned unit stride loads: 1 
   remark #15475: --- begin vector cost summary ---
   remark #15476: scalar cost: 7 
   remark #15477: vector cost: 3.500 
   remark #15478: estimated potential speedup: 0.770 
   remark #15488: --- end vector cost summary ---
   remark #25436: completely unrolled by 3  
LOOP END

现在,3.5+1.857=5.357 <7

所以,我仍然可以simd这个循环,得到一个加速比,或者我应该把加速比号码0.770在报告中,而不是simd它?

如何理解icc编译器在优化报告中的加速比?

EN

回答 1

Stack Overflow用户

发布于 2018-12-10 20:45:56

“标量成本”是指“标量循环一次迭代的成本”。

“向量代价”是指“向量化循环的一次迭代被vector_length*unroll_factor除以后的代价”,即与一次标量迭代相当的代价。

“向量化开销”表示循环前后向量初始化/终结的标准化成本(按向量迭代成本)。

计算整个循环执行的“估计的潜在加速比”。它显示了向量化循环执行的归一化(通过标量迭代代价)的潜在增益,包括对估计的环路行程计数的剥离、剩余和主循环。它不能从上面所示的标量和向量代价中显式地导出。

票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/53203989

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档