我正在尝试优化nbody算法,当我在循环中添加#杂注acc内核时,我不明白下面的注释是什么
#pragma acc kernels
for (i = 0; i < n; i++)
{
real fx, fy, fz;
fx = fy = fz = 0;
real iPosx = in[i].x;
real iPosy = in[i].y;
real iPosz = in[i].z;
for (j = 0; j < n; j++)
{
real rx, ry, rz;
rx = in[j].x - iPosx;
ry = in[j].y - iPosy;
rz = in[j].z - iPosz;
real distSqr = rx*rx+ry*ry+rz*rz;
distSqr += SOFTENING_SQUARED;
real s = in[j].w / POW(distSqr,1.5);
real3 ff;
ff.x = rx * s;
ff.y = ry * s;
ff.z = rz * s;
fx += ff.x;
fy += ff.y;
fz += ff.z;
}
force[i].x = fx;
force[i].y = fy;
force[i].z = fz;
}

什么是“生成隐式约简(+:fx)”
“生成隐式约简(+:fy)
“生成隐式约简(+:fz)”?
谢谢
发布于 2020-01-29 12:12:51
为了并行化内部的"j“循环,三个变量fx、fy和fz必须在求和运算中。编译器已经自动检测到这一点,并因此隐式地为您添加了缩减。这与您显式声明它们是一样的,例如:
#pragma acc loop reduction(+:fx,fy,fz)
for (j = 0; j < n; j++)
{
real rx, ry, rz;https://stackoverflow.com/questions/59955557
复制相似问题