我有一个错误的global_id()结果的问题。我想卷积3D体素与维度{35,35,35}与3D内核与维度{5,5,5}。因此,我使用global_size = {35,35,35}和local size = { 5, 5, 5}调用"clEnqueueNDRangeKernel“
std::vector<size_t> local_nd = { 5, 5, 5 };
std::vector<size_t> global_nd = { 35, 35, 35 };
err = clEnqueueNDRangeKernel( queue, hello_kernel, work_dim, NULL, global_nd.data(), local_nd.data(), 0, NULL, NULL); 当我调用get_global_id()函数时,我期望global_id(0)应该在0到34之间,global_id(1)应该在0到34之间,global_id(2)应该在0到34之间。
然而,对于global_id(0) and global_id(1)来说,结果似乎是正确的。然而,global_id(2)的取值范围是30 - 34,而不是我期望的0- 34。
const int ic0 = get_global_id(0); // icol
const int ic1 = get_global_id(1); // irow
const int ic2 = get_global_id(2); // idep
printf(" %d %d %d\n", ic0, ic1, ic2 );
// value of ic0 = [0 -> 34] correct!
// value of ic1 = [0 -> 34] correct!
// value of ic2 = [30 -> 34] ( SHOULD IT BE [0->34] )?我的gpu是最大-工作组是最大工作组项目ND:{ 1024,1024,64 }
发布于 2018-10-17 00:09:20
我按照pmdj的建议找到了问题所在。
printf in kernels isn't always reliable - there's often a fixed-size buffer, and if you output too much, some messages may be dropped.在我用一些条件修改了OpenCL代码之后。例如:
if( ic2< 10 )
printf("ic2: %d ", ic2 );如我所料,输出范围为0 --> 34
https://stackoverflow.com/questions/52832491
复制相似问题