文章/答案/技术大牛

发布

社区首页 >问答首页 >将非托管System.IntPtr字节向量复制到2D设备字节数组的GPU行中

问将非托管System.IntPtr字节向量复制到2D设备字节数组的GPU行中
EN

Stack Overflow用户

提问于 2014-12-25 17:09:04

回答 3查看 1K关注 0票数 4

我使用的是C#和CUDAfy.net (是的，这个问题在带指针的直C语言中更容易解决，但考虑到更大的系统，我有理由使用这种方法)。

我有一个视频帧抓取卡，是收集byte1024 x 1024图像数据在30 FPS。每隔33.3毫秒，它就会在循环缓冲区中填充一个插槽，并返回一个指向*byte的非托管1D矢量的*byte；循环缓冲区有15个插槽。

在GPU设备(Tesla K40)上，我希望有一个全局2D数组，它被组织成一个密集的2D数组。也就是说，我想要类似圆形队列的东西，但是在GPU上组织成一个密集的2D数组。

byte[15, 1024*1024] rawdata; 
// if CUDAfy.NET supported jagged arrays I could use byte[15][1024*1024 but it does not

我如何填写不同的行，每行33 in？我会用这样的方法吗？

gpu.CopyToDevice<byte>(inputPtr, 0, rawdata, offset, length) // length = 1024*1024
//offset is computed by  rowID*(1024*1024) where rowID wraps to 0 via modulo 15.
// inputPrt is the System.Inptr that points to the buffer in the circular queue (un-managed)?
// rawdata is a device buffer allocated gpu.Allocate<byte>(1024*1024);

在我的内核标题中是：

[Cudafy]
public static void filter(GThread thread, byte[,] rawdata, int frameSize, byte[] result)

我确实尝试了一些类似的东西。但是，在CudaFy中没有API模式：

GPGPU.CopyToDevice(T) Method (IntPtr, Int32, T[,], Int32, Int32, Int32)

因此，我使用gpu.Cast函数将2D设备数组更改为一维。

我尝试了下面的代码，但是我得到了CUDA.net异常:CUDA.net

FYI:当我尝试CUDA仿真器时，它在CopyToDevice上中止，声称数据不是主机分配的

public static byte[] process(System.IntPtr data, int slot)
{
    Stopwatch watch = new Stopwatch();
    watch.Start();
    byte[] output = new byte[FrameSize];
    int offset = slot*FrameSize;
    gpu.Lock();
    byte[] rawdata = gpu.Cast<byte>(grawdata, FrameSize); // What is the size supposed to be? Documentation lacking
    gpu.CopyToDevice<byte>(data, 0, rawdata, offset, FrameSize * frameCount);
    byte[] goutput = gpu.Allocate<byte>(output);
    gpu.Launch(height, width).filter(rawdata, FrameSize, goutput);
    runTime = watch.Elapsed.ToString();
    gpu.CopyFromDevice(goutput, output);
    gpu.Free(goutput);
    gpu.Synchronize();
    gpu.Unlock();
    watch.Stop();
    totalRunTime = watch.Elapsed.ToString();
    return output;
}

cuda

cudafy.net

回答 3

Stack Overflow用户

回答已采纳

发布于 2015-03-07 08:08:30

您应该考虑使用内置的GPGPU异步功能，以一种非常有效的方式将数据从/转到主机/设备并使用gpuKern.LaunchAsync(...)

请查看http://www.codeproject.com/Articles/276993/Base-Encoding-on-a-GPU，以获得使用此方法的有效方法。另一个很好的例子可以在CudafyExamples项目中找到，即查找PinnedAsyncIO.cs。做你所描述的一切。

这是在CudaGPU.cs中的Cudafy.Host项目中，它与您要寻找的方法相匹配(只是它是异步的)：

public void CopyToDeviceAsync<T>(IntPtr hostArray, int hostOffset, DevicePtrEx devArray,
                                  int devOffset, int count, int streamId = 0) where T : struct;
public void CopyToDeviceAsync<T>(IntPtr hostArray, int hostOffset, T[, ,] devArray,
                                 int devOffset, int count, int streamId = 0) where T : struct;
public void CopyToDeviceAsync<T>(IntPtr hostArray, int hostOffset, T[,] devArray,
                                  int devOffset, int count, int streamId = 0) where T : struct;
public void CopyToDeviceAsync<T>(IntPtr hostArray, int hostOffset, T[] devArray,
                                  int devOffset, int count, int streamId = 0) where T : struct;

票数 0

Stack Overflow用户

发布于 2015-01-07 15:51:13

我提出了这个“解决方案”，现在，或者：1.只在本机模式下运行程序(而不是在模拟模式下)。或者2.不要自己处理固定内存分配。

现在似乎还存在一个悬而未决的问题。但这只在仿真模式中发生。

请参阅：https://cudafy.codeplex.com/workitem/636

票数 1

Stack Overflow用户

发布于 2015-01-07 16:12:23

如果我正确理解你的问题，我想你是想把

从循环缓冲区获得的byte*到多维byte数组发送到

图形卡API。

            int slots = 15;
            int rows = 1024;
            int columns = 1024;

//Try this
            for (int currentSlot = 0; currentSlot < slots; currentSlot++)
            {
                IntPtr intPtrToUnManagedMemory = CopyContextFrom(currentSlot);
                // use Marshal.Copy ?  
                byte[] byteData = CopyIntPtrToByteArray(intPtrToUnManagedMemory); 

                int offset =0;
                for (int m = 0; m < rows; m++)
                    for (int n = 0; n < columns; n++)
                    {
                        //then send this to your GPU method
                        rawForGpu[m, n] = ReadByteValue(IntPtr: intPtrToUnManagedMemory, 
                                                        offset++);
                    }
            }

//or try this
            for (int currentSlot = 0; currentSlot < slots; currentSlot++)
            {
                IntPtr intPtrToUnManagedMemory = CopyContextFrom(currentSlot);

                // use Marshal.Copy ?
                byte[] byteData = CopyIntPtrToByteArray(intPtrToUnManagedMemory); 

                byte[,] rawForGpu = ConvertTo2DArray(byteData, rows, columns);
            }
        }

        private static byte[,] ConvertTo2DArray(byte[] byteArr, int rows, int columns)
        {
            byte[,] data = new byte[rows, columns];
            int totalElements = rows * columns;
            //Convert 1D to 2D rows, colums
            return data;
        }

        private static IntPtr CopyContextFrom(int slotNumber)
        {
            //code that return byte* from circular buffer.
            return IntPtr.Zero;
        }

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/27649230

复制

相似问题

问将非托管System.IntPtr字节向量复制到2D设备字节数组的GPU行中
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问将非托管System.IntPtr字节向量复制到2D设备字节数组的GPU行中EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问将非托管System.IntPtr字节向量复制到2D设备字节数组的GPU行中
EN