文章/答案/技术大牛

发布

社区首页 >问答首页 >PyOpenCL索引问题

问PyOpenCL索引问题
EN

Stack Overflow用户

提问于 2018-01-16 15:43:38

回答 1查看 53关注 0票数 1

我正在python中试验OpenCl，但是我不知道我在用这个简单的矩阵复制代码做错了什么。

我的输入矩阵是：

[1 2 34，

5 6 7 8，

9 10 11 12，

13 14 15 16，

17 18 19 20]

我得到了这个输出：

[1 2 34，

5 6 7 8，

9 100 0，

0 0 0

0 0 0]

为什么只复制了我矩阵的一部分？我做错了什么？

下面是我的代码：

import pyopencl as cl
import numpy as np

kernel = """
__kernel void
copy( __global const float *g_data, const int h, const int w, __global float *g_out )
{
// Get global position
size_t row = get_global_id(0);
const int s = row * w;
__global const float *in = &g_data[ s ];
__global float *out = &g_out[ s ];
for(int i=0; i<w; ++i)
{
          out[i] = in[i];
}
}
"""

class test:
      def __init__(self):
                 # Create opencl context
                 platform = cl.get_platforms()[2]
                 self.__ctx__ = cl.Context( [platform.get_devices()[0]] )
                 # Create opencl queue
                 self.__queue__ = cl.CommandQueue(self.__ctx__)
                 # Build opencl kernel
                 self.__kernel__ = cl.Program(self.__ctx__, kernel).build()


      def __del__(self):
             del self.__queue__
             del self.__kernel__
             del self.__ctx__

      def __call__(self, data):
                 # Get matrix dimensions
                 h, w = data.shape
                 mf = cl.mem_flags
                 # Set input buffer
                 g_data = cl.Buffer(self.__ctx__, (mf.READ_ONLY | mf.COPY_HOST_PTR), hostbuf=data)
                 # Set output buffer
                 self.__out__ = np.zeros( data.shape, dtype=np.float )
                 g_out = cl.Buffer(self.__ctx__, mf.WRITE_ONLY, self.__out__.nbytes)
                 # Run kernel
                 kernel_event = self.__kernel__.copy(
                                                                    self.__queue__,
                                                                    (h,),
                                                                    None,
                                                                    g_data,
                                                                    np.int32(h),
                                                                    np.int32(w),
                                                                    g_out,
                                                                    wait_for=None
                                                                )
                 # Copy data
                 out_event = cl.enqueue_copy(self.__queue__, self.__out__, g_out, wait_for=[kernel_event])
                 out_event.wait()
                 # Free memory
                 g_out.release()
                 print( self.__out__ )

python

opencl

gpgpu

pyopencl

回答 1

Stack Overflow用户

发布于 2018-01-16 20:14:08

我知道我做错了什么:我的矩阵在python (64位机器)中被声明为64位浮点数，而我在OpenCL代码中使用浮点指针似乎导致了这个问题。将OpenCL代码中的浮点数更改为双精度可以解决此问题:)

票数 2

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/48276227

复制

相似问题

问PyOpenCL索引问题
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问PyOpenCL索引问题EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问PyOpenCL索引问题
EN