文章/答案/技术大牛

发布

社区首页 >问答首页 >代码相同，使用C#、AleaGPU和设备内存的行为不同

问代码相同，使用C#、AleaGPU和设备内存的行为不同
EN

Stack Overflow用户

提问于 2017-10-23 18:17:55

回答 1查看 308关注 0票数 1

我正在使用AleaGPU库来执行矩阵乘法和类似的操作，我似乎不明白为什么我的代码不能像预期的那样工作。

“不按预期工作”的意思是，生成的矩阵具有具有正确值的第一行(或前几行)，其余的行都填充了0，与我在下面的其他代码示例中使用的代码相同。

函数#1 (不工作)：这个函数由于某种原因不能工作，而且它具有上面描述的行为。这听起来像是我混淆了索引，但我没有看到下面三个示例的代码有什么不同，而且我没有收到任何类型的错误(AleaGPU通常在试图访问无效数组位置时抛出异常)。

public static double[,] Multiply([NotNull] this double[,] m1, [NotNull] double[,] m2)
{
    // Checks
    if (m1.GetLength(1) != m2.GetLength(0)) throw new ArgumentOutOfRangeException("Invalid matrices sizes");

    // Initialize the parameters and the result matrix
    int h = m1.GetLength(0);
    int w = m2.GetLength(1);
    int l = m1.GetLength(1);

    // Execute the multiplication in parallel
    using (DeviceMemory2D<double> m1_device = Gpu.Default.AllocateDevice(m1))
    using (DeviceMemory2D<double> m2_device = Gpu.Default.AllocateDevice(m2))
    using (DeviceMemory2D<double> mresult_device = Gpu.Default.AllocateDevice<double>(h, w))
    {
        // Pointers setup
        deviceptr<double>
            pm1 = m1_device.Ptr,
            pm2 = m2_device.Ptr,
            pmresult = mresult_device.Ptr;

        // Local wrapper function
        void Kernel(int ki)
        {
            // Calculate the current indexes
            int
                i = ki / w,
                j = ki % w;

            // Perform the multiplication
            double sum = 0;
            int im1 = i * l;
            for (int k = 0; k < l; k++)
            {
                // m1[i, k] * m2[k, j]
                sum += pm1[im1 + k] * pm2[k * w + j];
            }
            pmresult[i * w + j] = sum; // result[i, j]
        }

        // Get the pointers and iterate fo each row
        Gpu.Default.For(0, h * w, Kernel);

        // Return the result
        return Gpu.Copy2DToHost(mresult_device);
    }
}

我看了几个小时的代码，试图检查每一行，但我真的不知道它有什么问题。

这个工作得很好，但我看不出第一个有什么区别

public static double[,] MultiplyGpuManaged([NotNull] this double[,] m1, [NotNull] double[,] m2)
{
    // Checks
    if (m1.GetLength(1) != m2.GetLength(0)) throw new ArgumentOutOfRangeException("Invalid matrices sizes");

    // Initialize the parameters and the result matrix
    int h = m1.GetLength(0);
    int w = m2.GetLength(1);
    int l = m1.GetLength(1);
    double[,]
        m1_gpu = Gpu.Default.Allocate(m1),
        m2_gpu = Gpu.Default.Allocate(m2),
        mresult_gpu = Gpu.Default.Allocate<double>(h, w);

    // Execute the multiplication in parallel
    Gpu.Default.For(0, h * w, index =>
    {
        // Calculate the current indexes
        int
            i = index / w,
            j = index % w;

        // Perform the multiplication
        double sum = 0;
        for (int k = 0; k < l; k++)
        {
            sum += m1_gpu[i, k] * m2_gpu[k, j];
        }
        mresult_gpu[i, j] = sum;
    });

    // Free memory and copy the result back
    Gpu.Free(m1_gpu);
    Gpu.Free(m2_gpu);
    double[,] result = Gpu.CopyToHost(mresult_gpu);
    Gpu.Free(mresult_gpu);
    return result;
}

这个工作也很好，，我做了这个额外的测试，以检查我是否在第一个函数中搞砸了索引(显然它们很好)

public static double[,] MultiplyOnCPU([NotNull] this double[,] m1, [NotNull] double[,] m2)
{
    // Checks
    if (m1.GetLength(1) != m2.GetLength(0)) throw new ArgumentOutOfRangeException("Invalid matrices sizes");

    // Initialize the parameters and the result matrix
    int h = m1.GetLength(0);
    int w = m2.GetLength(1);
    int l = m1.GetLength(1);
    double[,] result = new double[h, w];
    Parallel.For(0, h * w, index =>
    {
        unsafe
        {
            fixed (double* presult = result, pm1 = m1, pm2 = m2)
            {
                // Calculate the current indexes
                int
                    i = index / w,
                    j = index % w;

                // Perform the multiplication
                double sum = 0;
                int im1 = i * l;
                for (int k = 0; k < l; k++)
                {
                    sum += pm1[im1 + k] * pm2[k * w + j];
                }
                presult[i * w + j] = sum;
            }
        }
    });
    return result;
}

我真的不明白我在第一种方法中缺少了什么，我也不明白为什么它不起作用。

提前感谢您的帮助！

aleagpu

.net

wpf

visual-studio

回答 1

Stack Overflow用户

回答已采纳

发布于 2017-10-27 10:01:09

结果发现，这个问题是由gpu用于分配2D数组的方法引起的--它没有使用像标准.NET数组那样的单个连续内存块，而是在每行末尾添加了一些填充，这是出于性能原因。

处理2D gpu数组的正确方法是使用间距，它指示每一行的有效宽度(列+填充)。

下面是一个工作代码示例，它只填充一个2D gpu数组并将其复制回主机上：

const int size = 10;
double[,] matrix_gpu;
using (DeviceMemory2D<double> m_gpu = Gpu.Default.AllocateDevice<double>(size, size))
{
    deviceptr<double> ptr = m_gpu.Ptr;
    int pitch = m_gpu.PitchInElements.ToInt32();
    Gpu.Default.For(0, size, i =>
    {
        for (int j = 0; j < size; j++)
        {
            ptr[i * pitch + j] = i * size + j;
        }
    });
    matrix_gpu = Gpu.Copy2DToHost(m_gpu);
}

票数 2

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/46895963

复制

相似问题

问代码相同，使用C#、AleaGPU和设备内存的行为不同
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问代码相同，使用C#、AleaGPU和设备内存的行为不同EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问代码相同，使用C#、AleaGPU和设备内存的行为不同
EN