文章/答案/技术大牛

发布

社区首页 >问答首页 >如何使用CUDA C快速压缩稀疏数组？

问如何使用CUDA C快速压缩稀疏数组？
EN

Stack Overflow用户

提问于 2013-01-10 12:41:41

回答 1查看 3.3K关注 0票数 4

摘要

设备内存中的数组[A - B - - - C]，但是想要[A B C] --使用CUDA C最快的方法是什么？

上下文

我有一个在设备(GPU)内存上的整数数组A。在每次迭代时，我随机选择几个大于0的元素，并从中减去1。我维护那些等于0的元素的排序查找数组L：

Array A:
       @ iteration i: [0 1 0 3 3 2 0 1 2 3]
   @ iteration i + 1: [0 0 0 3 2 2 0 1 2 3]

Lookup for 0-elements L:
       @ iteration i: [0 - 2 - - - 6 - - -]  ->  want compacted form: [0 2 6]
   @ iteration i + 1: [0 1 2 - - - 6 - - -]  ->  want compacted form: [0 1 2 6]

(在这里，我随机选择了元素1和4来减去1。在我在CUDA C中的实现中，每个线程映射到A__中的一个元素上，因此查找数组是稀疏的，以防止数据竞争和维护排序顺序(例如[0 1 2 6]而不是[0 2 6 1]__)。

稍后，我将只对那些等于0的元素执行一些操作。因此，我需要压缩稀疏查找数组L，以便将线程映射到0-元素。

因此，用CUDA C在设备内存上压缩稀疏数组的最有效方法是什么？

非常感谢。

cuda

gpgpu

sparse-array

回答 1

Stack Overflow用户

回答已采纳

发布于 2013-01-10 20:46:29

假设我有：

int V[] = {1, 2, 0, 0, 5};

我想要的结果是：

int R[] = {1, 2, 5}

实际上，我们正在删除为零的元素，或者仅在非零的情况下复制元素.

#include <thrust/device_ptr.h>
#include <thrust/copy.h>
#include <stdio.h>
#define SIZE 5

#define cudaCheckErrors(msg) \
    do { \
        cudaError_t __err = cudaGetLastError(); \
        if (__err != cudaSuccess) { \
            fprintf(stderr, "Fatal error: %s (%s at %s:%d)\n", \
                msg, cudaGetErrorString(__err), \
                __FILE__, __LINE__); \
            fprintf(stderr, "*** FAILED - ABORTING\n"); \
            exit(1); \
        } \
    } while (0)

  struct is_not_zero
  {
    __host__ __device__
    bool operator()(const int x)
    {
      return (x != 0);
    }
  };



int main(){

  int V[] = {1, 2, 0, 0, 5};
  int R[] = {0, 0, 0, 0, 0};
  int *d_V, *d_R;

  cudaMalloc((void **)&d_V, SIZE*sizeof(int));
  cudaCheckErrors("cudaMalloc1 fail");
  cudaMalloc((void **)&d_R, SIZE*sizeof(int));
  cudaCheckErrors("cudaMalloc2 fail");

  cudaMemcpy(d_V, V, SIZE*sizeof(int), cudaMemcpyHostToDevice);
  cudaCheckErrors("cudaMemcpy1 fail");

  thrust::device_ptr<int> dp_V(d_V);
  thrust::device_ptr<int> dp_R(d_R);
  thrust::copy_if(dp_V, dp_V + SIZE, dp_R, is_not_zero());

  cudaMemcpy(R, d_R, SIZE*sizeof(int), cudaMemcpyDeviceToHost);
  cudaCheckErrors("cudaMemcpy2 fail");

  for (int i = 0; i<3; i++)
    printf("R[%d]: %d\n", i, R[i]);

  return 0;


}

结构定义为我们提供了一个测试零元素的函子。注意，在推力中，没有内核，我们没有直接编写设备代码。所有这些都发生在幕后。我肯定会建议你熟悉快速启动指南，这样就不会把这个问题变成一个关于推力的教程。

在回顾了这些评论之后，我认为这个修改后的代码版本将围绕cuda 4.0问题工作：

#include <thrust/device_ptr.h>
#include <thrust/copy.h>
#include <thrust/device_vector.h>
#include <thrust/host_vector.h>
#include <stdio.h>
#define SIZE 5

  struct is_not_zero
  {
    __host__ __device__
    bool operator()(const int x)
    {
      return (x != 0);
    }
  };



int main(){

  int V[] = {1, 2, 0, 0, 5};
  int R[] = {0, 0, 0, 0, 0};

  thrust::host_vector<int> h_V(V, V+SIZE);
  thrust::device_vector<int> d_V = h_V;
  thrust::device_vector<int> d_R(SIZE, 0);

  thrust::copy_if(d_V.begin(), d_V.end(), d_R.begin(), is_not_zero());
  thrust::host_vector<int> h_R = d_R;

  thrust::copy(h_R.begin(), h_R.end(), R);

  for (int i = 0; i<3; i++)
    printf("R[%d]: %d\n", i, R[i]);

  return 0;


}

票数 3

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/14258210

复制

相似问题

问如何使用CUDA C快速压缩稀疏数组？
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何使用CUDA C快速压缩稀疏数组？EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何使用CUDA C快速压缩稀疏数组？
EN