文章/答案/技术大牛

发布

社区首页 >问答首页 >OpenCL矩阵乘法-得到错误的答案

问OpenCL矩阵乘法-得到错误的答案
EN

Stack Overflow用户

提问于 2012-10-22 00:57:20

回答 1查看 1.9K关注 0票数 5

这是一个简单的OpenCL矩阵乘法内核，它让我抓狂：

顺便说一下，我正在使用pyopencl。

__kernel void matrixMul(  __global int* C,
                          __global int* A,
                          __global int* B,
                          int wA, int wB){

                int row = get_global_id(1); //2D Threas ID x
                int col = get_global_id(0); //2D Threas ID y

                //Perform dot-product accumulated into value
                int value = 0;
                for ( int k = 0; k < wA; k++ ){
                    value += A[row*wA + k] * B[k*wB+col];
                }
                C[row*wA+col] = value; //Write to the device memory
            }

Where (输入)

A = [72 45
     75 61]
B = [26 53 
     46 76]
wA = wB = 2

我得到的输出：

有时我会得到：

C = [3942 0
     0 5472]

否则我会得到：

C = [3942 7236
     3312 5472]

但是输出应该是：

C = [3942 7236
     4756 8611]

我不知道我在犯什么错误。我已经花了一整天的时间都没有运气。

请帮我弄一下这个

下面是完整的python代码：

import pyopencl as cl
import numpy as np
import os

ORDER = 2
LEN = ORDER*ORDER
ctx = cl.create_some_context()

commandQueue = cl.CommandQueue( ctx )

A = np.array((72, 45, 75, 61), dtype = np.int32)
B = np.array((26, 53, 46, 76), dtype = np.int32)
C = np.empty_like(A)

in_buf1 = cl.Buffer( ctx, cl.mem_flags.READ_ONLY | cl.mem_flags.COPY_HOST_PTR,
                 hostbuf = A )
in_buf2 = cl.Buffer( ctx, cl.mem_flags.READ_ONLY | cl.mem_flags.COPY_HOST_PTR,
                 hostbuf = B )
out_buf = cl.Buffer( ctx, cl.mem_flags.WRITE_ONLY, C.nbytes )

kernelSrc1 = """__kernel void
            matrixMul(  /*const int Mdim,
                        const int Ndim,
                        const int Pdim,*/
                        __global int* C,
                        __global int* A,
                        __global int* B,
                        int wA, int wB)
           {
                int row = get_global_id(1); //2D Threas ID x
                int col = get_global_id(0); //2D Threas ID y                

                //Perform dot-product accumulated into value
                int value = 0;
                for ( int k = 0; k < wA; k++ ){
                    value += A[row*wA + k] * B[k*wB+col];
                }
                C[row*wA+col] = value; //Write to the device memory
            }"""

program1 = cl.Program(ctx, kernelSrc1 ).build()
event1 = program1.matrixMul( commandQueue, (LEN, ), None,
                     out_buf, in_buf1, in_buf2, np.int32(ORDER), np.int32(ORDER));
event1.wait()

cl.enqueue_copy(commandQueue, C, out_buf)
print C

我使用的是Python 2.7.x，pyopencl 2012.1，AMD APP SDK

python

opencl

pyopencl

回答 1

Stack Overflow用户

回答已采纳

发布于 2012-10-22 03:02:34

您设置的全局大小参数不正确。由于在内核中使用全局大小的两个维度，因此需要将全局大小设置为(ORDER，ORDER)。当您将其更改为该值时，您将获得：

[3942 7236
 4756 8611]

票数 7

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/13000010

复制

相似问题

问OpenCL矩阵乘法-得到错误的答案
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问OpenCL矩阵乘法-得到错误的答案EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问OpenCL矩阵乘法-得到错误的答案
EN