文章/答案/技术大牛

发布

社区首页 >问答首页 >JCuda。重用已使用的指针

问JCuda。重用已使用的指针
EN

Stack Overflow用户

提问于 2012-05-13 21:23:52

回答 1查看 591关注 0票数 0

我在使用JCUDA时遇到了麻烦。我有一个任务是用CUFFT库做一维FFT，但结果应该是2的乘法，所以我决定用CUFFT_R2C类型做一维FFT。负责下一步工作的类：

public class FFTTransformer {

    private Pointer inputDataPointer;

    private Pointer outputDataPointer;

    private int fftType;

    private float[] inputData;

    private float[] outputData;

    private int batchSize = 1;

    public FFTTransformer (int type, float[] inputData) {
        this.fftType = type;
        this.inputData = inputData;
        inputDataPointer = new CUdeviceptr();

        JCuda.cudaMalloc(inputDataPointer, inputData.length * Sizeof.FLOAT);
        JCuda.cudaMemcpy(inputDataPointer, Pointer.to(inputData),
                inputData.length * Sizeof.FLOAT, cudaMemcpyKind.cudaMemcpyHostToDevice);

        outputDataPointer = new CUdeviceptr();
        JCuda.cudaMalloc(outputDataPointer, (inputData.length + 2) * Sizeof.FLOAT);

    }

    public Pointer getInputDataPointer() {
        return inputDataPointer;
    }

    public Pointer getOutputDataPointer() {
        return outputDataPointer;
    }

    public int getFftType() {
        return fftType;
    }

    public void setFftType(int fftType) {
        this.fftType = fftType;
    }

    public float[] getInputData() {
        return inputData;
    }

    public int getBatchSize() {
        return batchSize;
    }

    public void setBatchSize(int batchSize) {
        this.batchSize = batchSize;
    }

    public float[] getOutputData() {
        return outputData;
    }

    private void R2CTransform() {

        cufftHandle plan = new cufftHandle();

        JCufft.cufftPlan1d(plan, inputData.length, cufftType.CUFFT_R2C, batchSize);

        JCufft.cufftExecR2C(plan, inputDataPointer, outputDataPointer);

        JCufft.cufftDestroy(plan);
    }

    private void C2CTransform(){

        cufftHandle plan = new cufftHandle();

        JCufft.cufftPlan1d(plan, inputData.length, cufftType.CUFFT_C2C, batchSize);

        JCufft.cufftExecC2C(plan, inputDataPointer, outputDataPointer, fftType);

        JCufft.cufftDestroy(plan);
    }

    public void transform(){
        if (fftType == JCufft.CUFFT_FORWARD) {
            R2CTransform();
        } else {
            C2CTransform();
        }
    }

    public float[] getFFTResult() {
        outputData = new float[inputData.length + 2];
        JCuda.cudaMemcpy(Pointer.to(outputData), outputDataPointer,
                outputData.length * Sizeof.FLOAT, cudaMemcpyKind.cudaMemcpyDeviceToHost);
        return outputData;
    }

    public void releaseGPUResources(){
        JCuda.cudaFree(inputDataPointer);
        JCuda.cudaFree(outputDataPointer);
    }

    public static void main(String... args) {
        float[] inputData = new float[65536];
        for(int i = 0; i < inputData.length; i++) {
            inputData[i] = (float) Math.sin(i);
        }
        FFTTransformer transformer = new FFTTransformer(JCufft.CUFFT_FORWARD, inputData);
        transformer.transform();
        float[] result = transformer.getFFTResult();

        HilbertSpectrumTicksKernelInvoker.multiplyOn2(transformer.getOutputDataPointer(), inputData.length+2);

        transformer.releaseGPUResources();
    }
}

负责乘法的方法使用cuda核函数。Java方法代码：

public static void multiplyOn2(Pointer inputDataPointer, int dataSize){

        // Enable exceptions and omit all subsequent error checks
        JCudaDriver.setExceptionsEnabled(true);

        // Create the PTX file by calling the NVCC
        String ptxFileName = null;
        try {
            ptxFileName = FileService.preparePtxFile("resources\\HilbertSpectrumTicksKernel.cu");
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

        // Initialize the driver and create a context for the first device.
        cuInit(0);
        CUdevice device = new CUdevice();
        cuDeviceGet(device, 0);
        CUcontext context = new CUcontext();
        cuCtxCreate(context, 0, device);

        // Load the ptx file.
        CUmodule module = new CUmodule();
        cuModuleLoad(module, ptxFileName);

        // Obtain a function pointer to the "add" function.
        CUfunction function = new CUfunction();
        cuModuleGetFunction(function, module, "calcSpectrumSamples");

        // Set up the kernel parameters: A pointer to an array
        // of pointers which point to the actual values.
        int N = (dataSize + 1) / 2 + 1;
        int pair = (dataSize + 1) % 2 > 0 ? 1 : -1;

        Pointer kernelParameters = Pointer.to(Pointer.to(inputDataPointer),
                Pointer.to(new int[] { dataSize }),
                Pointer.to(new int[] { N }), Pointer.to(new int[] { pair }));

        // Call the kernel function.
        int blockSizeX = 128;
        int gridSizeX = (int) Math.ceil((double) dataSize / blockSizeX);
        cuLaunchKernel(function, gridSizeX, 1, 1, // Grid dimension
                blockSizeX, 1, 1, // Block dimension
                0, null, // Shared memory size and stream
                kernelParameters, null // Kernel- and extra parameters
        );
        cuCtxSynchronize();

        // Allocate host output memory and copy the device output
        // to the host.
        float freq[] = new float[dataSize];
        cuMemcpyDtoH(Pointer.to(freq), (CUdeviceptr)inputDataPointer, dataSize
                * Sizeof.FLOAT);

接下来是内核函数：

extern "C"

__global__ void calcSpectrumSamples(float* complexData, int dataSize, int N, int pair) {

    int i = threadIdx.x + blockIdx.x * blockDim.x;

    if(i >= dataSize) return;

    complexData[i] = complexData[i] * 2;
}

但是当我试图将指向快速傅立叶变换(在设备内存中)结果的指针传递给multiplyOn2方法时，它在调用cuCtxSynchronize()时抛出异常。例外：

Exception in thread "main" jcuda.CudaException: CUDA_ERROR_UNKNOWN
    at jcuda.driver.JCudaDriver.checkResult(JCudaDriver.java:263)
    at jcuda.driver.JCudaDriver.cuCtxSynchronize(JCudaDriver.java:1709)
    at com.ifntung.cufft.HilbertSpectrumTicksKernelInvoker.multiplyOn2(HilbertSpectrumTicksKernelInvoker.java:73)
    at com.ifntung.cufft.FFTTransformer.main(FFTTransformer.java:123)

我也试着用Visual Studion C++来做同样的事情，没有任何问题。你能帮帮我吗？

附注:我可以解决这个问题，但我需要将数据从设备内存复制到主机内存，然后在每次调用新的cuda函数之前创建新的指针，这会减慢程序的执行速度。

cuda

jcuda

回答 1

Stack Overflow用户

发布于 2012-05-22 20:49:40

错误到底发生在哪一行？

Cuda错误也可以是以前的错误。

你为什么要使用Pointer.to(inputDataPointer)，你已经有那个设备指针了。现在你把指向设备的指针传递给设备了吗？

Pointer kernelParameters = Pointer.to(Pointer.to(inputDataPointer),

我还建议使用"this“限定符或任何其他标记来检测实例变量。我讨厌并拒绝查看代码，特别是当我看不到方法中的变量试图通过读取它来调试它的范围时，就像您的示例一样嵌套和长。

我不想总是问自己这个该死的变量是从哪里来的。

如果SO的问题中的复杂代码格式不正确，我就不会阅读它。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/10572105

复制

相似问题

问JCuda。重用已使用的指针
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问JCuda。重用已使用的指针EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问JCuda。重用已使用的指针
EN