首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何: CUDA IFFT

如何: CUDA IFFT
EN

Stack Overflow用户
提问于 2017-10-04 10:25:13
回答 1查看 495关注 0票数 2

在Matlab中,当我输入一个一维复数数组时,我输出了一个相同大小和相同维数的实数数组。试着在CUDA C中重复这一点,但是输出不同。你能帮忙吗?在Matlab中,当我输入ifft(数组)时

我的arrayOfComplexNmbers:

代码语言:javascript
复制
[4.6500 + 0.0000i   0.5964 - 1.4325i   0.4905 - 0.5637i   0.4286 - 0.2976i   0.4345 - 0.1512i   0.4500 + 0.0000i   0.4345 + 0.1512i  0.4286 + 0.2976i   0.4905 + 0.5637i   0.5964 + 1.4325i]

我的arrayOfRealNumbers:

代码语言:javascript
复制
[ 0.9000    0.8000    0.7000    0.6000    0.5000    0.4000    0.3000    0.2000    0.1500    0.1000]

当我在Matlab中输入ifft(arrayOfComplexNmbers)时,输出是arrayOfRealNumbers。谢谢!这是我的数据自动化系统代码:

代码语言:javascript
复制
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <cuda_runtime.h>
#include <cufft.h>
#include "device_launch_parameters.h"
#include "device_functions.h"

#define NX 256
#define NY 128
#define NRANK 2
#define BATCH 1
#define SIGNAL_SIZE 10

typedef float2 Complex;
__global__ void printCUDAVariables_1(cufftComplex *cudaSignal){
int index = threadIdx.x + blockIdx.x*blockDim.x;    
printf("COMPLEX CUDA %d %f %f \n", index, cudaSignal[index].x, cudaSignal[index].y);
}

__global__ void printCUDAVariables_2(cufftReal *cudaSignal){
int index = threadIdx.x + blockIdx.x*blockDim.x;
printf("REAL CUDA %d %f \n", index, cudaSignal);
}


int main() {
cufftHandle plan;
//int n[NRANK] = { NX, NY };
Complex *h_signal = (Complex *)malloc(sizeof(Complex)* SIGNAL_SIZE);
float *r_signal = 0;
if (r_signal != 0){
    r_signal = (float*)realloc(r_signal, SIGNAL_SIZE * sizeof(float));
}
else{
    r_signal = (float*)malloc(SIGNAL_SIZE * sizeof(float));
}
int mem_size = sizeof(Complex)* SIGNAL_SIZE * 2;

h_signal[0].x = (float)4.65;
h_signal[0].y = (float)0;

h_signal[1].x = (float)0.5964;
h_signal[1].y = (float)0;

h_signal[2].x = (float)4.65;
h_signal[2].y = (float)-1.4325;

h_signal[3].x = (float)0.4905;
h_signal[3].y = (float)0.5637;

h_signal[4].x = (float)0.4286;
h_signal[4].y = (float)-0.2976;

h_signal[5].x = (float)0.4345;
h_signal[5].y = (float)-0.1512;

h_signal[6].x = (float)0.45;
h_signal[6].y = (float)0;

h_signal[7].x = (float)0.4345;
h_signal[7].y = (float)-0.1512;

h_signal[8].x = (float)0.4286;
h_signal[8].y = (float)0.2976;

h_signal[9].x = (float)0.4905;
h_signal[9].y = (float)-0.5637;

h_signal[10].x = (float)0.5964;
h_signal[10].y = (float)1.4325;
//for (int i = 0; i < SIGNAL_SIZE; i++){
//  printf("RAW %f %f\n", h_signal[i].x, h_signal[i].y);
//}
//allocate device memory for signal
cufftComplex *d_signal, *d_signal_out;
cudaMalloc(&d_signal, mem_size);    
cudaMalloc(&d_signal_out, mem_size);
cudaMemcpy(d_signal, h_signal, mem_size, cudaMemcpyHostToDevice);
printCUDAVariables_1 << <10, 1 >> >(d_signal);
//cufftReal *odata;
//cudaMalloc((void **)&odata, sizeof(cufftReal)*NX*(NY / 2 + 1));

//cufftPlan1d(&plan, SIGNAL_SIZE, CUFFT_C2R, BATCH);    
cufftPlan1d(&plan, NX, CUFFT_C2C, BATCH);
cufftExecC2C(plan, d_signal, d_signal_out, CUFFT_INVERSE);
//cufftExecC2R(plan, d_signal, odata);
cudaDeviceSynchronize();
printCUDAVariables_1 << <10, 1 >> >(d_signal_out);
//printCUDAVariables_2 << <10, 1 >> >(odata);
//cudaMemcpy(h_signal, d_signal_out, SIGNAL_SIZE*2*sizeof(float), cudaMemcpyDeviceToHost);

cufftDestroy(plan);
cudaFree(d_signal);
cudaFree(d_signal_out);

return 0;
}
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2017-10-04 12:12:04

使用MATLAB计算ifft时,默认行为如下:

  • 输入信号无零填充
  • 输出信号不缩放

您的CUFFT代码在流中是正确的,但是与MATLAB相比,一些不同的参数导致了当前的输出。

  • 特定的NX常数导致输入信号被零填充到256个长度。为了实现MATLAB的行为,保持NXSIGNAL_SIZE相等。
  • CUFFT将输出信号值与输入信号的长度相乘。您必须将输出值除以SIGNAL_SIZE,才能得到实际值。
  • 另一个重要问题是,在初始化输入信号时,您正在执行非绑定访问。信号长度为10,但您正在初始化10索引处的值,该值超出了范围。我认为这可能是由于MATLAB的基于1的索引所造成的混乱。输入信号必须从0初始化为SIGNAL_SIZE-1索引。
  • 不建议使用CUDA内核来可视化信号,因为打印可能会出现故障。您应该将结果复制回主机并按顺序打印。

这是固定的代码,它提供了与MATLAB相同的输出。

代码语言:javascript
复制
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <cuda_runtime.h>
#include <cufft.h>
#include "device_launch_parameters.h"
#include "device_functions.h"

#define NX 10
#define NY 1
#define NRANK 1
#define BATCH 1
#define SIGNAL_SIZE 10

typedef float2 Complex;  

int main() 
{
cufftHandle plan;
//int n[NRANK] = { NX, NY };
Complex *h_signal = (Complex *)malloc(sizeof(Complex)* SIGNAL_SIZE);
float *r_signal = 0;
if (r_signal != 0)
{
    r_signal = (float*)realloc(r_signal, SIGNAL_SIZE * sizeof(float));
}
else
{
    r_signal = (float*)malloc(SIGNAL_SIZE * sizeof(float));
}
int mem_size = sizeof(Complex)* SIGNAL_SIZE;

h_signal[0].x = (float)4.65;
h_signal[0].y = (float)0;

h_signal[1].x = (float)0.5964;
h_signal[1].y = (float)-1.4325;

h_signal[2].x = (float)0.4905;
h_signal[2].y = (float)-0.5637;

h_signal[3].x = (float)0.4286;
h_signal[3].y = (float)-0.2976;

h_signal[4].x = (float)0.4345;
h_signal[4].y = (float)-0.1512;

h_signal[5].x = (float)0.45;
h_signal[5].y = (float)0.0;

h_signal[6].x = (float)0.4345;
h_signal[6].y = (float)0.1512;

h_signal[7].x = (float)0.4286;
h_signal[7].y = (float)0.2976;

h_signal[8].x = (float)0.4905;
h_signal[8].y = (float)0.5637;

h_signal[9].x = (float)0.5964;
h_signal[9].y = (float)1.4325;

printf("\nInput:\n");
for(int i=0; i<SIGNAL_SIZE; i++)
{
    char op = h_signal[i].y < 0 ? '-' : '+';
    printf("%f %c %fi\n", h_signal[i].x/SIGNAL_SIZE, op, fabsf(h_signal[i].y/SIGNAL_SIZE ) );
}

//allocate device memory for signal
cufftComplex *d_signal, *d_signal_out;
cudaMalloc(&d_signal, mem_size);    
cudaMalloc(&d_signal_out, mem_size);
cudaMemcpy(d_signal, h_signal, mem_size, cudaMemcpyHostToDevice);


//cufftPlan1d(&plan, SIGNAL_SIZE, CUFFT_C2R, BATCH);    
cufftPlan1d(&plan, NX, CUFFT_C2C, BATCH);

cufftExecC2C(plan, d_signal, d_signal_out, CUFFT_INVERSE);

cudaDeviceSynchronize();

cudaMemcpy(h_signal, d_signal_out, SIGNAL_SIZE*sizeof(Complex), cudaMemcpyDeviceToHost);


printf("\n\n-------------------------------\n\n");
printf("Output:\n");
for(int i=0; i<SIGNAL_SIZE; i++)
{
    char op = h_signal[i].y < 0 ? '-' : '+';
    printf("%f %c %fi\n", h_signal[i].x/SIGNAL_SIZE, op, fabsf(h_signal[i].y/SIGNAL_SIZE ) );
}

cufftDestroy(plan);
cudaFree(d_signal);
cudaFree(d_signal_out);

return 0;
}

输出仍然是复杂的形式,但虚分量接近于零。另外,实际部件的精度差异在于MATLAB默认使用双精度,而此代码是基于单个精度值的。

在Ubuntu 14.04,CUDA 8.0上使用以下命令编译和测试:

nvcc -o ifft ifft.cu -arch=sm_61 -lcufft

将输出与MATLAB2017a进行比较。

程序输出:

代码语言:javascript
复制
Input:
0.465000 + 0.000000i
0.059640 - 0.143250i
0.049050 - 0.056370i
0.042860 - 0.029760i
0.043450 - 0.015120i
0.045000 + 0.000000i
0.043450 + 0.015120i
0.042860 + 0.029760i
0.049050 + 0.056370i
0.059640 + 0.143250i


-------------------------------

Output:
0.900000 - 0.000000i
0.800026 - 0.000000i
0.699999 - 0.000000i
0.599964 - 0.000000i
0.500011 + 0.000000i
0.400000 + 0.000000i
0.299990 + 0.000000i
0.199993 + 0.000000i
0.150000 + 0.000000i
0.100018 - 0.000000i
票数 5
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/46562575

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档