所以我正在用Java设计一个CNN,我真的想并行化卷积和池。这是我的方法(行、列、inputLayer、convLayer、poolLayer和特性已经在构造函数中初始化):
int padding = 3;
int filterSize = 2 * padding + 1;
int[] input = new int[rows * columns];
for(int r = 0; r < rows; r++)
System.arraycopy(inputLayer[r], 0, input, r * columns, columns);
int[] filters = new int[4 * filterSize * filterSize];
for(int fl = 0; fl < 4; fl++)
for(int fr = 0; fr < filterSize; fr++)
System.arraycopy(features[fl][fr], 0, filters, fl * filterSize * filterSize + fr * filterSize, filterSize);
float[] conv = new float[4 * rows * columns];
float[] pool = new float[rows * columns];
Range convRange = Range.create3D(columns, rows, 4, 2, 2, 2);
Kernel convKernel = new Kernel(){
int h = rows;
int w = columns;
int p = padding;
int fs = filterSize;
public void run(){
int val = 0;
int c = getGlobalId(0);
int r = getGlobalId(1);
int l = getGlobalId(2);
int upper = max(0, p - r);
int lower = min(fs, h + p - r);
int left = max(0, p - c);
int right = min(fs, w + p - c);
for (int i = upper; i < lower; i++)
for (int j = left; j < right; j++)
val += input[(r + i - p) * w + c + j - p] * filters[l * fs * fs + i * fs + j];
conv[l * h * w + r * w + c] = Math.round(100.00f * val / fs) / 100.00f;
}
};
convKernel.setExplicit(true);
convKernel.put(input);
convKernel.put(conv);
convKernel.put(filters);
convKernel.execute(convRange);
convKernel.get(conv);
for(int convL = 0; convL < 4; convL++)
for(int convR = 0; convR < rows; convR++)
System.arraycopy(conv, convL * rows * columns + convR * columns, convLayer[convL][convR], 0, columns);
Range poolRange = Range.create3D(columns / 2, rows / 2, 4, 2, 2, 2);
Kernel poolKernel = new Kernel(){
public void run(){
int wt = columns;
int ht = rows;
float val = 0.00f;
int c = getGlobalId(0);
int r = getGlobalId(1);
int l = getGlobalId(2);
for(int i = 0; i < 2; i++)
for(int j = 0; j < 2; j++)
val = max(val, leakyReLU(conv[l * ht * wt + (2 * r + i) * wt + 2 * c + j]));
pool[(l * ht * wt / 4) + (r * wt / 2) + c] = Math.round(100.00f * val) / 100.00f;
}
};
poolKernel.setExplicit(true);
poolKernel.put(conv);
poolKernel.put(pool);
poolKernel.execute(poolRange);
poolKernel.get(pool);
for(int poolL = 0; poolL < 4; poolL++)
for(int poolR = 0; poolR < rows / 2; poolR++)
System.arraycopy(pool, (poolL * rows * columns / 4) + (poolR * columns / 2), poolLayer[poolL][poolR], 0, columns / 2);这不是最漂亮的代码,但我已经很久没有使用Java了,更不用说Aparapi了。
最初,我直接使用了原始数组,但是api显示了一条消息,即它不支持它们,并切换到了本机模式。将所有东西转换为一维数组应该是可行的,但现在我收到了这样的消息:
VIII,2022年9:03:02 PM com.aparapi.internal.model.MethodModel init警告: Method max(FF)F不包含LocalVariableTable条目(未用-g编译的源)代码根将尝试创建基于字节码的合成表。这是实验性的!!8,2022 9:03:02 PM com.aparapi.internal.kernel.KernelRunner fallBackToNextDevice警告: NeuralNetwork$2设备失败,devices={NVIDIA替代算法Java线程池}:null
因此,看起来poolKernel不能解决最大的功能,整个事情落回了CPU。
调试时,我可以确认它只使用了12个线程--这是我的Intel i7所支持的数量。GPU是一个NVIDIA GeForce GTX 1650与896核心,所以这是我期待看到的。
此外,在最后,它说:
警告: Aparapi正在未经测试的OpenCL平台上运行: OpenCL 3.0 CUDA 11.3.123警告: Aparapi运行在未经测试的OpenCL平台上: OpenCL 3.0
我遗漏了什么?P.S.:正如你所想象的,我对conv网和GPGPU都是新手。我知道有一个库包含所有需要的cnn功能(Cudnn),但我想自己实现它,真正理解它是如何工作的。
发布于 2022-08-09 23:42:16
好吧..。有时候,显然,一个人需要写下一个人的问题,才能回答它。做了一些返工,现在所有的错误似乎都消失了:
int padding = 3;
int filterSize = 2 * padding + 1;
int[] params = {rows, columns, padding, filterSize};
int[] input = new int[rows * columns];
for(int r = 0; r < rows; r++)
System.arraycopy(inputLayer[r], 0, input, r * columns, columns);
int[] filters = new int[4 * filterSize * filterSize];
for(int fl = 0; fl < 4; fl++)
for(int fr = 0; fr < filterSize; fr++)
System.arraycopy(features[fl][fr], 0, filters, fl * filterSize * filterSize + fr * filterSize, filterSize);
float[] conv = new float[4 * rows * columns];
float[] pool = new float[rows * columns];
Range convRange = Range.create3D(columns, rows, 4);
Kernel convKernel = new Kernel(){
final int h = params[0];
final int w = params[1];
final int p = params[2];
final int fs = params[3];
public void run(){
int val = 0;
final int c = getGlobalId(0);
final int r = getGlobalId(1);
final int l = getGlobalId(2);
final int upper = max(0, p - r);
final int lower = min(fs, h + p - r);
final int left = max(0, p - c);
final int right = min(fs, w + p - c);
for (int i = upper; i < lower; i++)
for (int j = left; j < right; j++)
val += input[(r + i - p) * w + c + j - p] * filters[l * fs * fs + i * fs + j];
conv[l * h * w + r * w + c] = Math.round(100.00f * val / fs) / 100.00f;
}
};
convKernel.setExplicit(true);
convKernel.put(params);
convKernel.put(input);
convKernel.put(conv);
convKernel.put(filters);
convKernel.execute(convRange);
convKernel.get(conv);
for(int convL = 0; convL < 4; convL++)
for(int convR = 0; convR < rows; convR++)
System.arraycopy(conv, convL * rows * columns + convR * columns, convLayer[convL][convR], 0, columns);
Range poolRange = Range.create3D(columns / 2, rows / 2, 4);
Kernel poolKernel = new Kernel(){
final int ht = params[0];
final int wt = params[1];
public void run(){
//final float coef = coefficient;
float val = 0.00f;
final int c = getGlobalId(0);
final int r = getGlobalId(1);
final int l = getGlobalId(2);
for(int i = 0; i < 2; i++)
for (int j = 0; j < 2; j++) {
float tmp = NeuralNetwork.ReLU(conv[l * ht * wt + (2 * r + i) * wt + 2 * c + j]);
if(val < tmp) val = tmp;
}
pool[(l * ht * wt / 4) + (r * wt / 2) + c] = Math.round(100.00f * val) / 100.00f;
}
};
poolKernel.setExplicit(true);
poolKernel.put(params);
poolKernel.put(conv);
poolKernel.put(pool);
poolKernel.execute(poolRange);
poolKernel.get(pool);
for(int poolL = 0; poolL < 4; poolL++)
for(int poolR = 0; poolR < rows / 2; poolR++)
System.arraycopy(pool, (poolL * rows * columns / 4) + (poolR * columns / 2), poolLayer[poolL][poolR], 0, columns / 2);另外,我得出的结论是,我不需要LeakyReLU --正则ReLU是非常好的!尽管如此,我认为这个话题或多或少已经结束了。我希望有人能从我的坎坷道路中学到东西。
发布于 2022-08-09 22:32:05
明白了- leakyReLU也用了一个最大的浮子,我完全忘记了.我用if语句代替了这两个词。现在我得到的唯一错误消息是,根据api,有一些对象传递给内核(不支持这个)。但我没看到任何物体..。如果有人能帮上忙,请插话。
int padding = 3;
int filterSize = 2 * padding + 1;
int[] input = new int[rows * columns];
for(int r = 0; r < rows; r++)
System.arraycopy(inputLayer[r], 0, input, r * columns, columns);
int[] filters = new int[4 * filterSize * filterSize];
for(int fl = 0; fl < 4; fl++)
for(int fr = 0; fr < filterSize; fr++)
System.arraycopy(features[fl][fr], 0, filters, fl * filterSize * filterSize + fr * filterSize, filterSize);
float[] conv = new float[4 * rows * columns];
float[] pool = new float[rows * columns];
Range convRange = Range.create3D(columns, rows, 4);
Kernel convKernel = new Kernel(){
int h = rows;
int w = columns;
int p = padding;
int fs = filterSize;
public void run(){
int val = 0;
int c = getGlobalId(0);
int r = getGlobalId(1);
int l = getGlobalId(2);
int upper = max(0, p - r);
int lower = min(fs, h + p - r);
int left = max(0, p - c);
int right = min(fs, w + p - c);
for (int i = upper; i < lower; i++)
for (int j = left; j < right; j++)
val += input[(r + i - p) * w + c + j - p] * filters[l * fs * fs + i * fs + j];
conv[l * h * w + r * w + c] = Math.round(100.00f * val / fs) / 100.00f;
}
};
convKernel.setExplicit(true);
convKernel.put(input);
convKernel.put(conv);
convKernel.put(filters);
convKernel.execute(convRange);
convKernel.get(conv);
for(int convL = 0; convL < 4; convL++)
for(int convR = 0; convR < rows; convR++)
System.arraycopy(conv, convL * rows * columns + convR * columns, convLayer[convL][convR], 0, columns);
Range poolRange = Range.create3D(columns / 2, rows / 2, 4);
Kernel poolKernel = new Kernel(){
public void run(){
int wt = columns;
int ht = rows;
float coef = coefficient;
float val = 0.00f;
int c = getGlobalId(0);
int r = getGlobalId(1);
int l = getGlobalId(2);
for(int i = 0; i < 2; i++)
for (int j = 0; j < 2; j++) {
float tmp = conv[l * ht * wt + (2 * r + i) * wt + 2 * c + j];
if(tmp < 0) tmp = tmp * coef;
if (val < tmp) val = tmp;
}
pool[(l * ht * wt / 4) + (r * wt / 2) + c] = Math.round(100.00f * val) / 100.00f;
}
};
poolKernel.setExplicit(true);
poolKernel.put(conv);
poolKernel.put(pool);
poolKernel.execute(poolRange);
poolKernel.get(pool);
for(int poolL = 0; poolL < 4; poolL++)
for(int poolR = 0; poolR < rows / 2; poolR++)
System.arraycopy(pool, (poolL * rows * columns / 4) + (poolR * columns / 2), poolLayer[poolL][poolR], 0, columns / 2);https://stackoverflow.com/questions/73296421
复制相似问题