首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >使用辛普森规则加速集成2D gpuArray矩阵的代码

使用辛普森规则加速集成2D gpuArray矩阵的代码
EN

Stack Overflow用户
提问于 2021-02-10 23:52:48
回答 2查看 79关注 0票数 1

我有一个(真正的) 2D gpuArray,我正在使用它作为一个更大的代码的一部分,现在我正在尝试在我的主循环中使用Composite Simpson Rule来集成这个数组(至少有几次10000次迭代)。MWE如下所示:

代码语言:javascript
复制
%%%%%%%%%%%%%%%%%% MAIN CODE %%%%%%%%%%%%%%%%%%
Ny = 501;    % Dimensions of matrix M
Nx = 503;    %
dx = 0.1;    % Grid spacings
dy = 0.2;    %

M = rand(Ny, Nx, 'gpuArray'); % Initialise a  matrix

for k = 1:10000
    % M = function1(M)  % Apply some other functions to M
    % ... etc ...
    I = simpsons_integration_2D(M, dx, dy, Nx, Ny); % Now integrate M
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%%%%%%%%%%%%%% Integrator %%%%%%%%%%%%%%%%%
function I = simpsons_integration_2D(F, dx, dy, Nx, Ny)
% Integrate the 2D function F with Nx columns and Ny rows, and grid spacings
% dx and dy using Simpson's rule.

% Integrate along x direction (vertically) --> IX is a vector afterwards
sX = sum( F(:,1:2:Nx-2) + 4*F(:,2:2:(Nx-1)) + F(:,3:2:Nx) , 2);
IX = dx/3 * sX;

% Integrate along y direction --> I is a scalar afterwards
sY = sum( IX(1:2:Ny-2) + 4*IX(2:2:(Ny-1)) + IX(3:2:Ny) , 1);
I = dy/3 * sY;
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

执行积分的操作大约是850µs,这是我当前代码的重要部分。这是使用

代码语言:javascript
复制
f = @() simpsons_integration_2D(M, dx, dy, Nx, Ny);
t = gputimeit(f)

有没有一种方法可以减少集成gpuArray矩阵的执行时间?(显卡为Nvidia Quadro P4000)

非常感谢

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2021-02-11 01:58:42

假设矩阵的维数是奇数,下面是优化函数的一种方法:

代码语言:javascript
复制
function I = simpsons_integration_2D(F, dx, dy, Nx, Ny)
    sX = 2 * sum(F,2) + 2 * sum (F(:,2:2:(Nx-1)),2) -  F(:,1) - F(:,Nx);
    sY = dx/3 * (2 * sum(sX) + 2 * sum (sX(2:2:(Ny-1))) - sX(1) - sX(Ny));
    I = dy/3 * sY;
end

编辑

使用矩阵乘法的更优化的解决方案:

代码语言:javascript
复制
function I = simpsons_integration_2D2(F, dx, dy, Nx, Ny)
  mx = repmat (2,  Nx, 1);
  mx(2:2:(Nx-1)) = 4;
  mx(1) = 1;
  mx(Nx) = 1;
  my = repmat (2,  1, Ny);
  my(2:2:(Ny-1)) = 4;
  my(1) = 1;
  my(Ny) = 1;
  I = (dx*dy/9) * (my * (F * mx));
end

如果NxNy相同,则只需计算mxmy中的一个

代码语言:javascript
复制
function I = simpsons_integration_2D2(F, dx, dy, Nx, Ny)
  mx = repmat (2,  Nx, 1);
  mx(2:2:(Nx-1)) = 4;
  mx(1) = 1;
  mx(Nx) = 1;
  I = (dx*dy/9) * (mx.' * (F * mx));
end

如果NxNy是常量,则可以在函数外部预先计算mx并将其作为函数参数传递:

代码语言:javascript
复制
function I = simpsons_integration_2D2(F, dx, dy, mx)
  I = (dx*dy/9) * (mx.' * (F * mx));
end

编辑:

如果mxmy都可以预先计算,那么问题就简化为一个点积:

代码语言:javascript
复制
m = reshape (my.' .* mx.', 1, []);

function I = simpsons_integration_2D3(F, dx, dy, m)
  I = (dx*dy/9) * (m * F(:));
end
票数 1
EN

Stack Overflow用户

发布于 2021-02-11 02:07:55

好吧,我不能为你测试这个,但有几件事可能会有帮助。

首先轴1,然后轴2可能会在修改项的局部性方面产生一些差异(我不知道是更好还是更差)。

代码语言:javascript
复制
function I = variation1(F, dx, dy, Nx, Ny)
% Sum each term separately, prevents the creation of a big intermediate matrix
% Multiply outside the summation does only Ny multiplications by 4 instead of Ny*Nx/2
sX = sum(F(:,1:2:Nx-2), 2) + 4*sum(F(:,2:2:(Nx-1)), 2) + sum(F(:,3:2:Nx), 2);
IX = dx/3 * sX;
sY = sum(IX(1:2:Ny-2), 1) + 4*sum(IX(2:2:(Ny-1)), 1) + sum(IX(3:2:Ny) , 1);
I = dy/3 * sY;
end
代码语言:javascript
复制
function I = variation2(F, dx, dy, Nx, Ny)
% a.
% Sum each term separately, prevents the creation of a big intermediate matrix
% Multiply outside the summation does only Ny multiplications by 4 instead of Ny*Nx/2
% b.
% Notice that the terms 2:3:NX-2 appear in two summations
% Saves Nx*Ny/2 additions at the expense of Ny multiplications by 2
sX = 2*sum(F(:,3:2:Nx-2), 2) + 4*sum(F(:,2:2:(Nx-1)), 2) + F(:,1) + F(:,Nx);
% saves Ny multiplications by moving the constant factor after the next sum
sY = 2*sum(sX(3:2:Ny-2), 1) + 4*sum(sX(2:2:(Ny-1)), 1) + sX(1) + sX(Ny);
I = (dy*dy/9) * sY;
end
代码语言:javascript
复制
function I = alternate_simpsons_integration_2D(F, dx, dy, Nx, Ny)
% Integrate the 2D function F with Nx columns and Ny rows, and grid spacings
% dx and dy using Simpson's rule.

% Notice that sum(F(:,1:2:Nx-2) + F(:,3:2:Nx)) have all but the end poitns repeated.
IX = 4*sum(F(:,2:2:Nx-1), 2) + 2 * sum(F(:,3:2:Nx-2) , 2) + F(:,1) + F(:,Nx);
disp(size(IX))
% Integrate along y direction --> I is a scalar afterwards
sY = 4*sum(IX(2:2:Ny-1)) + 2*sum(IX(3:2:Ny-2)) + IX(1) + IX(Ny);
I = dy*dy/9 * sY;
end

如果你认为单次求和更好,那么你可以使用公式2*(sum(2*F(2:2:end-1) + F(1:2:end-2)) + F(end) - F(1),它给出了相同的结果,但在第一次积分时Nx*Ny/2加法更少。但这些选项必须在您的环境中进行测试。

转置实现

代码语言:javascript
复制
function I = transposed_simpsons_integration_2D(F, dx, dy, Nx, Ny)
sY = 2*sum(2*F(2:2:end-1, :) + F(1:2:end-2, :), 1) + F(end, :) - F(1, :);
sX = 2*sum(2*sY(2:2:end-1) + sY(1:2:end-2)) + sY(end) - sY(1);
I = dy*dy/9 * sX;
end

使用倍频程(通常比matlab慢),每次迭代的运行时间约为400us。这不是在GPU上运行的有趣的工作负载类型。作为比较,randn大约比这个函数慢10倍。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/66140278

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档