首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >CUDA我的共享内存代码不工作,我遗漏了什么?

CUDA我的共享内存代码不工作,我遗漏了什么?
EN

Stack Overflow用户
提问于 2019-07-02 17:35:26
回答 1查看 82关注 0票数 1

我正在尝试实现动态共享内存,但它不起作用。请检查代码并告诉我我遗漏了什么-问题似乎与gpu_configuration ()有关。

下面是一个基本的动态共享内存代码。我已经将其与https://github.com/NVIDIA-developer-blog/code-samples/blob/master/series/cuda-cpp/shared-memory/shared-memory.cu进行了比较,但找不到缺少的部分。

如果我要删除函数gpu_configuration (),它工作得很好,但是我在那里使用函数gpu_configuration进行了非法的内存访问。我将此函数用作另一段代码的一部分,在那里一切正常。

我在Kubuntu14.4,CUDA7.0上使用的是Quadro2000卡--卡的详细信息由gpu_configuration打印出来,并在下面列出。

顺便说一句,共享内存.cu代码在我的机器上运行良好,所以它不是卡或共享内存的问题。

代码语言:javascript
复制
#include <stdio.h>
inline void gpuAssert (cudaError_t code, const char *file, const char *func, int line);
typedef struct gpu_config {
    int     n_threads;  // execution of kernel with given index
    int     n_blocks;   // bundle of threads - also warp size
    int     n_grid;     // bundle of blocks
    int     dev_count;  // number of cuda devices
    size_t  shmem;      // sh_mem per block
    size_t  free_mem;   // free memory on the card
    size_t  tot_mem;    // total memory on the card
    struct  cudaDeviceProp  dev_prop;   // device properties
} gpu_config;

#define CUDA_GLOBAL_CHECK {gpuErrChk (cudaPeekAtLastError ()); gpuErrChk (cudaDeviceSynchronize ());}
#define gpuErrChk(ans)      {gpuAssert((ans), \
__FILE__, __func__, __LINE__);}
#define CudaDbgPrn(M, ...)  {printf ("DevDEBUG:%s:%s:%d: " M "\n", \
    __FILE__, __func__, (int) __LINE__, ##__VA_ARGS__);}
#define Dbg(M, ...)     fprintf(stderr, "DEBUG %s:%s:%d: " M "\n", __FILE__, \
    __func__, __LINE__, ##__VA_ARGS__)
inline void gpuAssert (cudaError_t code, const char *file, const char *func, int line)
{
    if (code != cudaSuccess) {
        fprintf(stderr,"CUDA call from file:%s func:%s %d: %s:%s failed\n", file, func, line, cudaGetErrorName(code), cudaGetErrorString(code));
        exit (code);
    }
}

static void gpu_configuration (gpu_config *gc);
static void gpu_configuration (gpu_config *gc)
{
    int i = 0;
    gpuErrChk (cudaDeviceReset ());     // reset device

    gpuErrChk (cudaGetDeviceCount (&gc -> dev_count));
    Dbg("Device count %d", gc -> dev_count);

    gpuErrChk (cudaGetDeviceProperties (&(gc -> dev_prop), i));
    gc -> n_threads = gc -> dev_prop.maxThreadsPerBlock;
    gc -> n_blocks = gc -> dev_prop.warpSize;
    dim3 block (gc -> n_blocks);
    gc -> n_grid = (gc -> n_blocks + block.x - 1) / block.x;

    gc -> shmem = gc -> dev_prop.sharedMemPerBlock;
    gpuErrChk (cudaMemGetInfo (&(gc -> free_mem), &(gc -> tot_mem)));

    Dbg ("Dev prop name: %s, tot_mem: %u sharedMemPerBlock %u\nwarpSize %d maxThreadsPerBlock %d\nmaxthreads per mprocessor %d",
    gc -> dev_prop.name, (unsigned) gc -> dev_prop.totalGlobalMem,
    (unsigned) gc -> dev_prop.sharedMemPerBlock,
    gc -> dev_prop.warpSize, gc -> dev_prop.maxThreadsPerBlock,
    gc -> dev_prop.maxThreadsPerMultiProcessor);
}

#define         MAX_SIZE        4000
#define         NUM             2
// #define          NUM         32

__global__ void kernel(int *d_data)
{
    extern __shared__ int sdata[];

    sdata[threadIdx.x] = threadIdx.x;
    __syncthreads ();

    CudaDbgPrn ("sdata [%u]=%u", (unsigned) threadIdx.x, (unsigned) sdata[threadIdx.x]);
    CudaDbgPrn ("d_data [%u]=%u", (unsigned) threadIdx.x, (unsigned) d_data[threadIdx.x]);
    d_data[threadIdx.x] = sdata[threadIdx.x];

    CudaDbgPrn ("sdata [%u]=%u d_data [%u]=%u", (unsigned) threadIdx.x, (unsigned) sdata[threadIdx.x], (unsigned) threadIdx.x, (unsigned) d_data[threadIdx.x]);
}

int main()
{
    int *d_data;
    gpuErrChk (cudaMalloc ((void**)&d_data, sizeof(int) * MAX_SIZE));
    gpuErrChk (cudaMemset (d_data, '\0', sizeof(int) * MAX_SIZE));

    gpu_config gc;
    gpu_configuration (&gc);

    kernel<<<1, NUM, (NUM * sizeof (int))>>> (d_data);
    CUDA_GLOBAL_CHECK;

    cudaFree(d_data);
    return 0;
}

下面是我在命令行中得到的结果:

代码语言:javascript
复制
rinka@Desktop:~/Documents/dev/code$ nvcc -Xptxas -v shmem_test.cu -o shmem
ptxas info    : 139 bytes gmem, 40 bytes cmem[14]
ptxas info    : Compiling entry function '_Z6kernelPi' for 'sm_20'
ptxas info    : Function properties for _Z6kernelPi
    40 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 21 registers, 40 bytes cmem[0]

rinka@Desktop:~/Documents/dev/code$ ./shmem
DEBUG shmem_test.cu:gpu_configuration:36: Device count 1
DEBUG shmem_test.cu:gpu_configuration:51: Dev prop name: Quadro 2000, tot_mem: 1073414144 sharedMemPerBlock 49152
warpSize 32 maxThreadsPerBlock 1024
maxthreads per mprocessor 1536
DevDEBUG:shmem_test.cu:kernel:65: sdata [0]=0
DevDEBUG:shmem_test.cu:kernel:65: sdata [1]=1
CUDA call from file:shmem_test.cu func:main 82: cudaErrorIllegalAddress:an illegal memory access was encountered failed
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2019-07-02 17:42:01

当您调用gpu_configuration(&gc)时,它内部的cudaDeviceReset()调用将释放当前设备上以前分配的所有内存。因此,d_data变得无效并导致内核失败。

您可以删除cudaDeviceReset()调用来解决此问题。或者,gpu_configuration调用应该是程序中的第一个函数调用,以便后续的内存分配保持有效。

票数 3
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/56849434

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档