首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Aparapi无法解决max并返回到CPU。

Aparapi无法解决max并返回到CPU。
EN

Stack Overflow用户
提问于 2022-08-09 18:15:42
回答 2查看 173关注 0票数 0

所以我正在用Java设计一个CNN,我真的想并行化卷积和池。这是我的方法(行、列、inputLayer、convLayer、poolLayer和特性已经在构造函数中初始化):

代码语言:javascript
复制
    int padding = 3;
    int filterSize = 2 * padding + 1;
    int[] input = new int[rows * columns];
    for(int r = 0; r < rows; r++)
        System.arraycopy(inputLayer[r], 0, input, r * columns, columns);
    int[] filters = new int[4 * filterSize * filterSize];
    for(int fl = 0; fl < 4; fl++)
        for(int fr = 0; fr < filterSize; fr++)
            System.arraycopy(features[fl][fr], 0, filters, fl * filterSize * filterSize + fr * filterSize, filterSize);
    float[] conv = new float[4 * rows * columns];
    float[] pool = new float[rows * columns];

    Range convRange = Range.create3D(columns, rows, 4, 2, 2, 2);
    Kernel convKernel = new Kernel(){
        int h = rows;
        int w = columns;
        int p = padding;
        int fs = filterSize;
        public void run(){
            int val = 0;
            int c = getGlobalId(0);
            int r = getGlobalId(1);
            int l = getGlobalId(2);
            int upper = max(0, p - r);
            int lower = min(fs, h + p - r);
            int left = max(0, p - c);
            int right = min(fs, w + p - c);
            for (int i = upper; i < lower; i++)
                for (int j = left; j < right; j++)
                    val += input[(r + i - p) * w + c + j - p] * filters[l * fs * fs + i * fs + j];
            conv[l * h * w + r * w + c] = Math.round(100.00f * val / fs) / 100.00f;
        }
    };
    convKernel.setExplicit(true);
    convKernel.put(input);
    convKernel.put(conv);
    convKernel.put(filters);
    convKernel.execute(convRange);
    convKernel.get(conv);
    for(int convL = 0; convL < 4; convL++)
        for(int convR = 0; convR < rows; convR++)
            System.arraycopy(conv, convL * rows * columns + convR * columns, convLayer[convL][convR], 0, columns);

    Range poolRange = Range.create3D(columns / 2, rows / 2, 4, 2, 2, 2);
    Kernel poolKernel = new Kernel(){
        public void run(){
            int wt = columns;
            int ht = rows;
            float val = 0.00f;
            int c = getGlobalId(0);
            int r = getGlobalId(1);
            int l = getGlobalId(2);
            for(int i = 0; i < 2; i++)
                for(int j = 0; j < 2; j++)
                    val = max(val, leakyReLU(conv[l * ht * wt + (2 * r + i) * wt + 2 * c + j]));
            pool[(l * ht * wt / 4) + (r * wt / 2) + c] = Math.round(100.00f * val) / 100.00f;
        }
    };
    poolKernel.setExplicit(true);
    poolKernel.put(conv);
    poolKernel.put(pool);
    poolKernel.execute(poolRange);
    poolKernel.get(pool);
    for(int poolL = 0; poolL < 4; poolL++)
        for(int poolR = 0; poolR < rows / 2; poolR++)
            System.arraycopy(pool, (poolL * rows * columns / 4) + (poolR * columns / 2), poolLayer[poolL][poolR], 0, columns / 2);

这不是最漂亮的代码,但我已经很久没有使用Java了,更不用说Aparapi了。

最初,我直接使用了原始数组,但是api显示了一条消息,即它不支持它们,并切换到了本机模式。将所有东西转换为一维数组应该是可行的,但现在我收到了这样的消息:

VIII,2022年9:03:02 PM com.aparapi.internal.model.MethodModel init警告: Method max(FF)F不包含LocalVariableTable条目(未用-g编译的源)代码根将尝试创建基于字节码的合成表。这是实验性的!!8,2022 9:03:02 PM com.aparapi.internal.kernel.KernelRunner fallBackToNextDevice警告: NeuralNetwork$2设备失败,devices={NVIDIA替代算法Java线程池}:null

因此,看起来poolKernel不能解决最大的功能,整个事情落回了CPU。

调试时,我可以确认它只使用了12个线程--这是我的Intel i7所支持的数量。GPU是一个NVIDIA GeForce GTX 1650与896核心,所以这是我期待看到的。

此外,在最后,它说:

警告: Aparapi正在未经测试的OpenCL平台上运行: OpenCL 3.0 CUDA 11.3.123警告: Aparapi运行在未经测试的OpenCL平台上: OpenCL 3.0

我遗漏了什么?P.S.:正如你所想象的,我对conv网和GPGPU都是新手。我知道有一个库包含所有需要的cnn功能(Cudnn),但我想自己实现它,真正理解它是如何工作的。

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2022-08-09 23:42:16

好吧..。有时候,显然,一个人需要写下一个人的问题,才能回答它。做了一些返工,现在所有的错误似乎都消失了:

代码语言:javascript
复制
    int padding = 3;
    int filterSize = 2 * padding + 1;
    int[] params = {rows, columns, padding, filterSize};
    int[] input = new int[rows * columns];
    for(int r = 0; r < rows; r++)
        System.arraycopy(inputLayer[r], 0, input, r * columns, columns);
    int[] filters = new int[4 * filterSize * filterSize];
    for(int fl = 0; fl < 4; fl++)
        for(int fr = 0; fr < filterSize; fr++)
            System.arraycopy(features[fl][fr], 0, filters, fl * filterSize * filterSize + fr * filterSize, filterSize);
    float[] conv = new float[4 * rows * columns];
    float[] pool = new float[rows * columns];

    Range convRange = Range.create3D(columns, rows, 4);
    Kernel convKernel = new Kernel(){
        final int h = params[0];
        final int w = params[1];
        final int p = params[2];
        final int fs = params[3];
        public void run(){
            int val = 0;
            final int c = getGlobalId(0);
            final int r = getGlobalId(1);
            final int l = getGlobalId(2);
            final int upper = max(0, p - r);
            final int lower = min(fs, h + p - r);
            final int left = max(0, p - c);
            final int right = min(fs, w + p - c);
            for (int i = upper; i < lower; i++)
                for (int j = left; j < right; j++)
                    val += input[(r + i - p) * w + c + j - p] * filters[l * fs * fs + i * fs + j];
            conv[l * h * w + r * w + c] = Math.round(100.00f * val / fs) / 100.00f;
        }
    };
    convKernel.setExplicit(true);
    convKernel.put(params);
    convKernel.put(input);
    convKernel.put(conv);
    convKernel.put(filters);
    convKernel.execute(convRange);
    convKernel.get(conv);
    for(int convL = 0; convL < 4; convL++)
        for(int convR = 0; convR < rows; convR++)
            System.arraycopy(conv, convL * rows * columns + convR * columns, convLayer[convL][convR], 0, columns);

    Range poolRange = Range.create3D(columns / 2, rows / 2, 4);
    Kernel poolKernel = new Kernel(){
        final int ht = params[0];
        final int wt = params[1];
        public void run(){
            //final float coef = coefficient;
            float val = 0.00f;
            final int c = getGlobalId(0);
            final int r = getGlobalId(1);
            final int l = getGlobalId(2);
            for(int i = 0; i < 2; i++)
                for (int j = 0; j < 2; j++) {
                    float tmp = NeuralNetwork.ReLU(conv[l * ht * wt + (2 * r + i) * wt + 2 * c + j]);
                    if(val < tmp) val = tmp;
                }
            pool[(l * ht * wt / 4) + (r * wt / 2) + c] = Math.round(100.00f * val) / 100.00f;
        }
    };
    poolKernel.setExplicit(true);
    poolKernel.put(params);
    poolKernel.put(conv);
    poolKernel.put(pool);
    poolKernel.execute(poolRange);
    poolKernel.get(pool);
    for(int poolL = 0; poolL < 4; poolL++)
        for(int poolR = 0; poolR < rows / 2; poolR++)
            System.arraycopy(pool, (poolL * rows * columns / 4) + (poolR * columns / 2), poolLayer[poolL][poolR], 0, columns / 2);

另外,我得出的结论是,我不需要LeakyReLU --正则ReLU是非常好的!尽管如此,我认为这个话题或多或少已经结束了。我希望有人能从我的坎坷道路中学到东西。

票数 0
EN

Stack Overflow用户

发布于 2022-08-09 22:32:05

明白了- leakyReLU也用了一个最大的浮子,我完全忘记了.我用if语句代替了这两个词。现在我得到的唯一错误消息是,根据api,有一些对象传递给内核(不支持这个)。但我没看到任何物体..。如果有人能帮上忙,请插话。

代码语言:javascript
复制
    int padding = 3;
    int filterSize = 2 * padding + 1;
    int[] input = new int[rows * columns];
    for(int r = 0; r < rows; r++)
        System.arraycopy(inputLayer[r], 0, input, r * columns, columns);
    int[] filters = new int[4 * filterSize * filterSize];
    for(int fl = 0; fl < 4; fl++)
        for(int fr = 0; fr < filterSize; fr++)
            System.arraycopy(features[fl][fr], 0, filters, fl * filterSize * filterSize + fr * filterSize, filterSize);
    float[] conv = new float[4 * rows * columns];
    float[] pool = new float[rows * columns];

    Range convRange = Range.create3D(columns, rows, 4);
    Kernel convKernel = new Kernel(){
        int h = rows;
        int w = columns;
        int p = padding;
        int fs = filterSize;
        public void run(){
            int val = 0;
            int c = getGlobalId(0);
            int r = getGlobalId(1);
            int l = getGlobalId(2);
            int upper = max(0, p - r);
            int lower = min(fs, h + p - r);
            int left = max(0, p - c);
            int right = min(fs, w + p - c);
            for (int i = upper; i < lower; i++)
                for (int j = left; j < right; j++)
                    val += input[(r + i - p) * w + c + j - p] * filters[l * fs * fs + i * fs + j];
            conv[l * h * w + r * w + c] = Math.round(100.00f * val / fs) / 100.00f;
        }
    };
    convKernel.setExplicit(true);
    convKernel.put(input);
    convKernel.put(conv);
    convKernel.put(filters);
    convKernel.execute(convRange);
    convKernel.get(conv);
    for(int convL = 0; convL < 4; convL++)
        for(int convR = 0; convR < rows; convR++)
            System.arraycopy(conv, convL * rows * columns + convR * columns, convLayer[convL][convR], 0, columns);

    Range poolRange = Range.create3D(columns / 2, rows / 2, 4);
    Kernel poolKernel = new Kernel(){
        public void run(){
            int wt = columns;
            int ht = rows;
            float coef = coefficient;
            float val = 0.00f;
            int c = getGlobalId(0);
            int r = getGlobalId(1);
            int l = getGlobalId(2);
            for(int i = 0; i < 2; i++)
                for (int j = 0; j < 2; j++) {
                    float tmp = conv[l * ht * wt + (2 * r + i) * wt + 2 * c + j];
                    if(tmp < 0) tmp = tmp * coef;
                    if (val < tmp) val = tmp;
                }
            pool[(l * ht * wt / 4) + (r * wt / 2) + c] = Math.round(100.00f * val) / 100.00f;
        }
    };
    poolKernel.setExplicit(true);
    poolKernel.put(conv);
    poolKernel.put(pool);
    poolKernel.execute(poolRange);
    poolKernel.get(pool);
    for(int poolL = 0; poolL < 4; poolL++)
        for(int poolR = 0; poolR < rows / 2; poolR++)
            System.arraycopy(pool, (poolL * rows * columns / 4) + (poolR * columns / 2), poolLayer[poolL][poolR], 0, columns / 2);
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/73296421

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档