首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >cuda错误4未指明的启动失败

cuda错误4未指明的启动失败
EN

Stack Overflow用户
提问于 2016-05-04 19:45:18
回答 1查看 10.2K关注 0票数 0

下面是GPU内核的代码片段:

代码语言:javascript
复制
__global_ void POCKernel(int *a) 
{
int i = threadIdx.x;
a[i] = a[i] + 1;

if (i < 1024 * 1024)
{
    double dblNewMemoryVarA[15];
    double dblNewMemoryVarB[15];
    double dblNewMemoryVarC[15];

    //double* dblNewMemoryVarA = (double*)malloc(15 * sizeof(double));
    ////memset(dblNewMemoryVarA, 0, 15 * sizeof(double));
    //double* dblNewMemoryVarB = (double*)malloc(15 * sizeof(double));
    ////memset(dblNewMemoryVarB, 0, 15 * sizeof(double));
    //double* dblNewMemoryVarC = (double*)malloc(15 * sizeof(double));
    ////memset(dblNewMemoryVarC, 0, 15 * sizeof(double));
    for (int j = 0; j < 15; j++)
    {
        dblNewMemoryVarA[j] = 0;
        dblNewMemoryVarB[j] = 0;
        dblNewMemoryVarC[j] = 0;
    }
    dblNewMemoryVarC[i] = dblNewMemoryVarA[i] + dblNewMemoryVarB[i];
    dblNewMemoryVarC[i] = dblNewMemoryVarA[i] * dblNewMemoryVarB[i];
    dblNewMemoryVarC[i] = dblNewMemoryVarA[i] - dblNewMemoryVarB[i];
    /*free(dblNewMemoryVarA);
    free(dblNewMemoryVarB);
    free(dblNewMemoryVarC);*/


}

}

对此内核的调用函数为:

代码语言:javascript
复制
int main()
{



const int arraySize = 1024 * 1024;
int* a = new int[arraySize];
int *dev_a = 0;

for (int i = 0; i < arraySize; i++)
{
    a[i] = 5;
}


cudaError_t cudaStatus;

// Choose which GPU to run on, change this on a multi-GPU system.

cudaStatus = cudaSetDevice(0);
if (cudaStatus != cudaSuccess) {
    fprintf(stderr, "CUDA failed!");
    return 1;
}

// Allocate GPU buffers for three vectors (two input, one output)    .
cudaStatus = cudaMalloc((void**)&dev_a, arraySize * sizeof(int));
if (cudaStatus != cudaSuccess) {
    fprintf(stderr, "cudaMalloc failed!");
    goto Error;
}

// Copy input vectors from host memory to GPU buffers.
cudaStatus = cudaMemcpy(dev_a, a, arraySize * sizeof(int), cudaMemcpyHostToDevice);
if (cudaStatus != cudaSuccess) {
    fprintf(stderr, "cudaMemcpy failed!");
    goto Error;
}
// Launch a kernel on the GPU with one thread for each element.
POCKernel << <4096, 256 >> >(dev_a);

// Check for any errors launching the kernel
cudaStatus = cudaGetLastError();
if (cudaStatus != cudaSuccess) {
    fprintf(stderr, "addKernel launch failed: %s\n", cudaGetErrorString(cudaStatus));
    goto Error;
}

// cudaDeviceSynchronize waits for the kernel to finish, and returns
// any errors encountered during the launch.
cudaStatus = cudaDeviceSynchronize();
if (cudaStatus != cudaSuccess) {
    fprintf(stderr, "cudaDeviceSynchronize returned error code %d after launching addKernel!\n", cudaStatus);
    goto Error;


    // cudaDeviceReset must be called before exiting in order for profiling and
    // tracing tools such as Nsight and Visual Profiler to show complete traces.
    cudaStatus = cudaDeviceReset();
    if (cudaStatus != cudaSuccess) {
        fprintf(stderr, "cudaDeviceReset failed!");
        return 1;
    }

Error:
    cudaFree(dev_a);



    return 0;
    }
}

在cudaDeviceSynchronize上,错误代码为4-未指明的启动失败。有人能告诉我为什么我要面对这个问题吗?

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2016-05-05 08:07:39

这段代码在很多方面都很奇怪,但让我们直奔主题。下面这几行内核代码中有一个明确的问题:

代码语言:javascript
复制
int i = threadIdx.x;
...

if (i < 1024 * 1024)
{
    double dblNewMemoryVarA[15];
    double dblNewMemoryVarB[15];
    double dblNewMemoryVarC[15];
    ...
    dblNewMemoryVarC[i] = dblNewMemoryVarA[i] + dblNewMemoryVarB[i];

您正在启动每个线程包含256个线程的线程块:

代码语言:javascript
复制
POCKernel << <4096, 256 >> >(dev_a);
                    ^^^

这意味着在一个块中的所有线程中,threadIdx.x变量的范围从0到255:

代码语言:javascript
复制
int i = threadIdx.x;

在您的局部变量中,您为15个量分配了空间:

代码语言:javascript
复制
    double dblNewMemoryVarA[15];

但是您随后尝试使用i对这些数组进行索引,如前所述,它的范围最大为255:

代码语言:javascript
复制
    dblNewMemoryVarC[i] = dblNewMemoryVarA[i] + dblNewMemoryVarB[i];

因此,这将产生越界索引,这很可能导致内核启动失败。

这是不可能肯定的,因为您没有提供完整的代码,也没有说明您是如何编译的,或者您在什么环境中运行。但从代码正确性的角度来看,上述做法肯定是非法的。

我猜你是在调试模式(-G)下编译的。如果不是,我希望编译器在If测试结束后优化所有内容,因为这些代码都不会影响任何全局状态。

正如评论中所指出的,如果你正在运行这个windows,你可能会遇到一个windows WDDM超时。

票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/37026858

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档