文章/答案/技术大牛

发布

问加速神经网络计算
EN

Stack Overflow用户

提问于 2022-09-19 19:58:15

回答 1查看 90关注 0票数 1

我正在努力完成Nvidia的“CUDA Python加速计算基础”课程，并完成了重构一些代码的简单版本的任务，这些代码执行在神经网络中创建隐藏层所需的工作：

import numpy as np
from numba import cuda, vectorize

n = 1000000

greyscales = np.floor(np.random.uniform(0, 255, n).astype(np.float32))
weights = np.random.normal(.5, .1, n).astype(np.float32)

from numpy import exp

def normalize(grayscales):
    return grayscales / 255

def weigh(values, weights):
    return values * weights
    
def activate(values):
    return ( exp(values) - exp(-values) ) / ( exp(values) + exp(-values) )

def create_hidden_layer(n, greyscales, weights, exp, normalize, weigh, activate):
    normalized = normalize(greyscales)
    weighted = weigh(normalized, weights)
    activated = activate(weighted)
    return activated

arguments = {"n":n,
            "greyscales": greyscales,
            "weights": weights,
            "exp": exp,
            "normalize": normalize,
            "weigh": weigh,
            "activate": activate}

a = create_hidden_layer(**arguments)
print(a)

我对代码进行了一些转换，修改后如下所示：

from math import exp

@vectorize(['float32(float32)'],target='cuda')
def normalize(grayscales):
    return grayscales / 255

@vectorize(['float32(float32,float32)'],target='cuda')
def weigh(values, weights):
    return values * weights

@vectorize(['float32(float32)'],target='cuda')
def activate(values):
    return ( exp(values) - exp(-values) ) / ( exp(values) + exp(-values) )

def create_hidden_layer(n, greyscales, weights, exp, normalize, weigh, activate):
    normalized = normalize(greyscales)
    weighted = weigh(normalized, weights)
    activated = activate(weighted)
    return activated

greyscales = cuda.to_device(greyscales)
weights = cuda.to_device(weights)

normalized = cuda.device_array(shape=(n,), dtype=np.float32)
weighted = cuda.device_array(shape=(n,), dtype=np.float32)
activated = cuda.device_array(shape=(n,), dtype=np.float32)

activated = activated.copy_to_host()

arguments = {"n":n,
            "greyscales": greyscales,
            "weights": weights,
            "exp": exp,
            "normalize": normalize,
            "weigh": weigh,
            "activate": activate}

a = create_hidden_layer(**arguments)
print(a)

在所有的转换之后，代码似乎工作得很好，但是有一段代码是.还不够快。在任务中，代码应该在小于1s内运行，而我的代码运行在1.23s.

也许有人知道我怎么能更多地重构我的代码？或者注意到我在代码中犯了什么愚蠢的错误？会非常感谢您的帮助！

acceleration

python

gpu

numba

回答 1

Stack Overflow用户

发布于 2022-09-19 23:36:30

以下是您可以尝试加快代码速度的一些事情：

使用@cuda.jit编译内核，在内核中使用
，使用cuda.grid(2)获取2D线程索引，使用cuda.blockDim.x获取块中的线程数。使用这些方法计算数组的一维索引，并将其存储在共享内存数组中。内核中的
，一旦所有线程都到达共享内存数组，就使用cuda.synchronize()等待所有线程到达内核中的那个点。然后，使用共享内存数组访问全局内存中的数据。
使用cuda.shared.array()和cuda.shared.to_device()创建共享内存数组并将其复制到GPU。
一旦内核完成，使用cuda.synchronize()等待所有线程到达内核的末尾。然后，使用cuda.from_device()将数据复制回CPU。
您也可以使用cuda.to_device()和cuda.from_device()在CPU和GPU之间复制数据，如果您愿意的话。
也可以使用cuda.device_array_like()在GPU上创建类似于CPU上的数组的数组。

G 226

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/73778660

复制

相似问题

问加速神经网络计算
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问加速神经网络计算EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问加速神经网络计算
EN