文章/答案/技术大牛

发布

社区首页 >问答首页 >CUDA GPU的Numba代码编译失败，并显示OSError: exception: access violation reading

问CUDA GPU的Numba代码编译失败，并显示OSError: exception: access violation reading
EN

Stack Overflow用户

提问于 2020-05-31 18:59:19

回答 3查看 308关注 0票数 0

我在用Python/Numba为CUDA GPU编写代码时遇到了一个问题，我根本不知道我做错了什么，也不知道哪个组件是坏的。在95%的情况下，程序没有编译，因为当我重启python内核并尝试运行我编写的CUDA内核时，我收到了以下错误。

OSError Traceback (most recent call last)
in
6 data_isgomoku = np.load("data/isgomoku_moreThan5Good.npy")
7
----> 8 kernel_test_02[blocks, threads_per_block](envs_in_gpu, envs_out_gpu, random_states, data_isgomoku, proximities)
9 print ("Run OKAY")

~.conda\envs\tensorflow\lib\site-packages\numba\cuda\compiler.py in call(self, *args)
757 '''
758 #print ("_call *args=",*args)
--> 759 kernel = self.specialize(*args)
760 cfg = kernel[self.griddim, self.blockdim, self.stream, self.sharedmem]
761 cfg(*args)

~.conda\envs\tensorflow\lib\site-packages\numba\cuda\compiler.py in specialize(self, *args)
768 argtypes = tuple(
769 [self.typingctx.resolve_argument_type(a) for a in args])
--> 770 kernel = self.compile(argtypes)
771 return kernel
772

~.conda\envs\tensorflow\lib\site-packages\numba\cuda\compiler.py in compile(self, sig)
787 self.definitions[(cc, argtypes)] = kernel
788 if self.bind:
--> 789 kernel.bind()
790 return kernel
791

~.conda\envs\tensorflow\lib\site-packages\numba\cuda\compiler.py in bind(self)
528 Force binding to current CUDA context
529 """
--> 530 self._func.get()
531
532 @Property

~.conda\envs\tensorflow\lib\site-packages\numba\cuda\compiler.py in get(self)
405 cufunc = self.cache.get(device.id)
406 if cufunc is None:
--> 407 ptx = self.ptx.get()
408
409 # Link

~.conda\envs\tensorflow\lib\site-packages\numba\cuda\compiler.py in get(self)
376 arch = nvvm.get_arch_option(*cc)
377 ptx = nvvm.llvm_to_ptx(self.llvmir, opt=3, arch=arch,
--> 378 **self._extra_options)
379 self.cache[cc] = ptx
380 if config.DUMP_ASSEMBLY:

~.conda\envs\tensorflow\lib\site-packages\numba\cuda\cudadrv\nvvm.py in llvm_to_ptx(llvmir, **opts)
498 cu.add_module(libdevice.get())
499
--> 500 ptx = cu.compile(**opts)
501 # XXX remove debug_pubnames seems to be necessary sometimes
502 return patch_ptx_debug_pubnames(ptx)

~.conda\envs\tensorflow\lib\site-packages\numba\cuda\cudadrv\nvvm.py in compile(self, **options)
234 c_opts = (c_char_p * len(opts))(*[c_char_p(x.encode('utf8'))
235 for x in opts])
--> 236 err = self.driver.nvvmCompileProgram(self._handle, len(opts), c_opts)
237 self._try_error(err, 'Failed to compile\n')
238

OSError: exception: access violation reading 0x000001C3B042FFF8

据我所知，这显然是发送到NVVM编译器的代码的问题，因为它会使其崩溃。我尝试过CUDA Toolkit 9.2，10.0,10.1,10.2和各种版本的Numba，但这个问题在所有情况下都存在。

有人已经安装了Numba，NVIDIA CUDA Toolkit，Jupyter Notebook，请查看sample code to reproduce the error上提供的代码和文件，看看这个问题是否可以重现？

有没有人能暗示我做错了什么？

为了在Windows上重现它，尝试几次: a.重启您的python内核，b.尝试运行它。代码：

%load_ext autoreload
%autoreload 2
import numpy as np
from IPython.display import clear_output
import time
import os 
import sys
import pickle
import random
import numba 
from numba import jit,cuda
from numba.cuda.random import create_xoroshiro128p_states, xoroshiro128p_uniform_float32

# constants
MOVE_E = 0
MOVE_X = 1
MOVE_O = 2
MOVE_I = 3
# board representation
gb_type = np.dtype([
    ('boardO',(np.int32,29*29)),
    ('boardX',(np.int32,29*29)),
    ('boardCommon',(np.int32,29*29)),
    ('moveProximities',(np.int32,29*29)),
    ('noOfMoves', np.int32),
    ('undo_moveProximitites', np.int32, (19 * 19, 5*5)),
    ('undo_moves', np.int32, (19 * 19, 4)), # 0 - pos, 1-  lastMoveColor, posTypesX[movepos], posTypesO[movepos]
    ('doSaveUndoInformation', np.int32),
    ('lastMove', np.int32),
    ('lastMoveColor', np.int32),
    ('colorToMove', np.int32),
    ('gameEnded', np.int32),
    ('hashkey1', np.int32),
    ('hashkey2', np.int32),
    ('posTypesX',(np.int32,29*29)), # bitek: 0:gomoku, 1:D1, 2:D2, 3:A1, 4:A2, 5: W2, 6: A1xA2, 7: A2xA2, 8:A2 later A1
    ('posTypesO',(np.int32,29*29)),
    ('undo_movePosTypesX', np.int32, (19 * 19, 8*5)),
    ('undo_movePosTypesO', np.int32, (19 * 19, 8*5))
    ])

@cuda.jit(device=True)
def cuda_init(env, startFromCenter , startFromCentralHalf, proximities,  random_states):
    "init the board's data and calculate proximities"
    thread_id = cuda.grid(1)
    cuda_fillArrayWithZero(env.boardO)
    cuda_fillArrayWithZero(env.boardX)
    cuda_fillArrayWithZero(env.moveProximities)
    cuda_fillArrayWithZero(env.boardCommon)
    env.noOfMoves = 0
    #env.undo_boardO[:,:] = 0
    #env.undo_boardX[:,:] = 0
    for i in range(19 * 19):
        for j in range(4):
            env.undo_moves[i,j] = 0
        for j in range(5*5):
            env.undo_moveProximitites[i,j] = 0

    cuda_fillArrayWithZero(env.posTypesX)
    cuda_fillArrayWithZero(env.posTypesO)

    if startFromCenter:
        gb_updateMoveProximities(env, gb_calcPosFromXY(9,9), proximities)
        env.moveProximities[gb_calcPosFromXY(9,9)] = 1
    elif startFromCentralHalf:
        X = 4+int(14 * xoroshiro128p_uniform_float32(random_states, thread_id))
        Y = 4 + int(14 * xoroshiro128p_uniform_float32(random_states, thread_id))
        #X = random.randint(4, 18 - 4)
        #Y = random.randint(4, 18 - 4)
        gb_updateMoveProximities(env, gb_calcPosFromXY(X, Y), proximities)
        env.moveProximities[gb_calcPosFromXY(X, Y)] = 1


    env.lastMove = -1
    env.lastMoveColor = 2
    env.colorToMove = 1
    env.doSaveUndoInformation = True
    env.gameEnded = False
    env.hashkey1 = 267425067
    env.hashkey2 = 576253428
    return

@cuda.jit(device=True)
def cuda_fillArrayWithZero(myarray):
    "fill array with zero"
    for i in range(myarray.shape[0]):
        myarray[i] = 0
    return

@cuda.jit("i4(i4,i4)",device=True)
def gb_calcPosFromXY( X, Y):
    "convert coordinates"
    if  not(X>=0 and X<19): print ("Error calcPosFromXY, wrong X=" ,X)
    if not(Y >= 0 and Y < 19): print("Error calcPosFromXY, wrong Y=" ,Y)
    return (5+X) + 29*(5+Y)

@cuda.jit( "UniTuple(i8, 2)(i4)",device=True)
def gb_calcXYFromPos( pos):
    "convert coordinates"
    if pos is None:
        return -1, -1
    if not(pos>=0 and pos < 29*29): print ( "Error calcXYFromPos, wrong pos number=" , pos)
    X = pos % 29 - 5
    Y = int (((pos - 5 - X) / 29)) - 5
    if not(X >= 0 and X < 19): print("Error calcXYFromPos, wrong X=", X)
    if not(Y >= 0 and Y < 19): print("Error calcXYFromPos, wrong Y=", Y)
    return X,Y


@cuda.jit(device=True)
def gb_updateMoveProximities(env, pos, proximities):
    "set some positions to 1 using the coordinates in the pre-calculated proximities array"
    if pos != -1:
        k=0
        while proximities[pos, k]>0:
            kPos = proximities[pos, k]
            #print (kPos, gb_calcXYFromPos(kPos))
            if env.boardCommon[kPos] == 0:
                env.moveProximities[kPos] = 1
            k += 1
    else:
        pos = gb_calcPosFromXY(9,9)
        k = 0
        while proximities[pos, k] > 0:
            kPos = proximities[pos, k]
            env.moveProximities[kPos] = 1
            k += 1
        env.moveProximities[pos] = 1


# CUDA kernel test 1 - init board

# define 100 boards
N=100
envs = np.zeros(N, dtype=gb_type)


threads_per_block = 64
# Calculate the number of thread blocks in the grid
blocks = (envs.shape[0] + (threads_per_block - 1)) // threads_per_block
print ("threads_per_block = ", threads_per_block, "blockspergrid=", blocks)
random_states = create_xoroshiro128p_states(threads_per_block * blocks, seed=1)    

#kernel function
@cuda.jit()
def kernel_test_01(io_array, out_array, proximities, random_states):
    thread_id = cuda.grid(1)
    if thread_id < io_array.size:
        cuda_init(io_array[thread_id], True, False, proximities, random_states)
        out_array[thread_id] = io_array[thread_id]

#  test kernel 1
envs_out = np.zeros((envs.shape[0]), dtype = gb_type)
envs_in_gpu = numba.cuda.to_device(envs)
envs_out_gpu = numba.cuda.to_device(envs_out)
#proximities = np.load("data/proximities_2.npy")
proximities = np.zeros((841, 25), dtype=np.int32)

kernel_test_01[blocks, threads_per_block](envs_in_gpu, envs_out_gpu, proximities, random_states)
print ("Run OKAY")

# CUDA kernel test 2 - call some more device functions

@cuda.jit(device=True)
def gb_calc_gindex(i1,i2,i3,i4,i5,i6,i7,i8,i9,i10):
    return int((((((((((((((((((i1 << 1) + i2) << 1) +
                         i3) << 1) + i4) << 1) + i5) << 1) +
                   i6) << 1) + i7) << 1) + i8) << 1) +
             i9) << 1) + i10)

@cuda.jit(device=True)
def isPosValid(pos):
    "check if a position is valid"
    i = pos % 29 - 5
    j = int (((pos - 5 - i) / 29)) - 5
    if i>=0 and i<19 and j>=0 and j<19:
        return True
    return False

@cuda.jit(device=True)
def gb_check_if_gomokuPos(env , POS, color, dontCheckPos , data_isgomoku):

    if not isPosValid(POS):
        X,Y = gb_calcXYFromPos(POS)
        print("Error gb_check_if_gomokuPos, wrong X or Y=", X, Y)

    # ha foglalt
    #if env.boardCommon[POS] != 0:
    #    return False

    if not dontCheckPos:
        if env.boardX[POS] != 0 or env.boardO[POS] != 0:
            return False

    board = env.boardX
    if color == MOVE_O:
        board = env.boardO

    if board[POS-1] == 1 or board[POS+1] == 1:
        index = gb_calc_gindex(board[POS-5],board[POS-4],board[POS-3],board[POS-2],board[POS-1],board[POS+1],board[POS+2],board[POS+3],board[POS+4],board[POS+5])
        if index <0 or index > data_isgomoku.shape[0]: print ("Error gb_check_if_gomokuPos, wrong index=", index, 1)

        if data_isgomoku[index]:
            return True

    if board[POS-29] == 1 or board[POS+29] == 1:
        index = gb_calc_gindex(board[POS-145],board[POS-116],board[POS-87],board[POS-58],board[POS-29],board[POS+29],board[POS+58],board[POS+87],board[POS+116],board[POS+145])
        if index < 0 or index > data_isgomoku.shape[0]: print("Error gb_check_if_gomokuPos, wrong index=", index, 2)

        if data_isgomoku[index]:
            return True

    if board[POS-30] == 1 or board[POS+30] == 1:
        index = gb_calc_gindex(board[POS-150],board[POS-120],board[POS-90],board[POS-60],board[POS-30],board[POS+30],board[POS+60],board[POS+90],board[POS+120],board[POS+150])
        if index < 0 or index > data_isgomoku.shape[0]: print("Error gb_check_if_gomokuPos, wrong index=", index, 3)

        if data_isgomoku[index]:
            return True

    if board[POS-28] == 1 or board[POS+28] == 1:
        index = gb_calc_gindex(board[POS-140],board[POS-112],board[POS-84],board[POS-56],board[POS-28],board[POS+28],board[POS+56],board[POS+84],board[POS+112],board[POS+140])
        if index < 0 or index > data_isgomoku.shape[0]: print("Error gb_check_if_gomokuPos, wrong index=", index, 4)

        if data_isgomoku[index]:
            return True


    return False

@cuda.jit()
def kernel_test_02(io_array, out_array, random_states, data_isgomoku,proximities):
    "this kernel serves only for testing purposes, it calls some of the device functions in order to check if they work properly"

    thread_id = cuda.grid(1)
    if thread_id < io_array.size:

        X = 12 
        Y = 12 
        pos = gb_calcPosFromXY(X,Y)
        env = io_array[thread_id]
        # 
        cuda_init(io_array[thread_id], True, False, proximities, random_states)
        gb_updateMoveProximities(env, pos, proximities)
        v = gb_check_if_gomokuPos(env, pos, MOVE_X, False, data_isgomoku)

        out_array[thread_id] = io_array[thread_id]

#  test kernel 2
envs_out = np.zeros((envs.shape[0]), dtype = gb_type)
envs_in_gpu = numba.cuda.to_device(envs)
envs_out_gpu = numba.cuda.to_device(envs_out)
#proximities = np.load("data/proximities_2.npy")
proximities = np.zeros((841, 25), dtype=np.int32)
#data_isgomoku = np.load("data/isgomoku_moreThan5Good.npy")
data_isgomoku = np.zeros((1024), dtype=np.int32)

kernel_test_02[blocks, threads_per_block](envs_in_gpu, envs_out_gpu, random_states, data_isgomoku, proximities)
print ("Run OKAY")

python

gpu

numba

回答 3

Stack Overflow用户

发布于 2020-06-07 05:22:34

我去掉了自定义的ndtype，访问冲突问题在numba 0.46上消失了。不幸的是，以后的版本似乎不稳定，尽管没有使用自定义的ndtype，但仍然会随机产生“访问冲突”错误。

票数 1

Stack Overflow用户

发布于 2020-09-17 16:34:03

我也遇到了同样的问题。程序有时可以通过，但是95%的程序会打印出来

err = self.driver.nvvmCompileProgram(self._handle，len(opts)，c_opts)

OSError:异常:访问冲突读取0x0000017CF7D14718

经过一些搜索，它与CUDA函数中传递的参数有关。有时该函数无法正确转换参数类型。然而，最终的解决方案仍然不是很清楚。

票数 0

Stack Overflow用户

发布于 2021-08-20 09:08:09

使用Cuda模拟器进行调试

如果是用户代码中的错误，启用CUDA Simulator通常有助于调试这种类型的错误。

更多描述性错误

import os

# needs to appear before `from numba import cuda`
os.environ["NUMBA_ENABLE_CUDASIM"] = "1"
# set to "1" for more debugging, but slower performance
os.environ["NUMBA_CUDA_DEBUGINFO"] = "1"

from numba import cuda, njit

它进入(伪)内核并产生一个更有意义的错误，而不是非描述性的访问错误。在我的例子中：

  File "C:\Users\sterg\...\wasserstein-gpu-debug.py", line 372, in divide
    out[i] = v[i] / b
IndexError: tid=[1, 0, 0] ctaid=[0, 0, 0]: index 3 is out of bounds for axis 0 with size 3

使用python调试器(pdb)调试

更好的是，如果我进入调试模式(Spyder5.0.5中的Ctrl+F5)，使用每个文件的配置(Ctrl+F6或Menubar --> Run --> "Configuration per file...")设置为“在外部系统控制台中执行”：

和“执行后与Python控制台交互”：

然后，我可以使用locals()来验证v是一个4元素的一维数组，b是一个标量，out是一个3元素的一维数组，i是3(来自range(len(v))：

(Pdb) locals()
{'v': array([0.        , 0.54269606, 0.8236306 , 1.1803839 ], dtype=float32), 'b': 1.1803839, 'out': array([0.        , 0.45976233, 0.697765  ], dtype=float32), 'i': 3}

显然，当out只有3个元素时，out[3] = v[3]将在这里产生一个错误(正如IndexError消息所暗示的那样)。

如果这对我没有帮助(对于那些不太熟悉pdb的人来说)，我可以使用u在堆栈跟踪中向上移动，使用d向下移动，使用w检查我所在的位置(pdb commands和pdb docs)。

已完成调试

调试完成后，将环境变量设置回"0“，以便您的代码可以再次快速运行。

import os

# needs to appear before `from numba import cuda`
os.environ["NUMBA_ENABLE_CUDASIM"] = "0"
# set to "1" for more debugging, but slower performance
os.environ["NUMBA_CUDA_DEBUGINFO"] = "0"

from numba import cuda, njit

numba 0.55.0/cudatoolkit 11.2.2 -->操作码正常

至于OP的用例，我在外部终端、当前控制台和Jupyter Notebook中运行Spyder .py文件中的代码(没有开头的%行)，它运行得很好。OP还提到，numba 0.46停止了该错误的发生。

(ElM2D) PS C:\Users\sterg> conda list cudatoolkit
# packages in environment at C:\Users\sterg\anaconda3\envs\ElM2D:
#
# Name                    Version                   Build  Channel
cudatoolkit               11.2.2               h933977f_8    conda-forge
(ElM2D) PS C:\Users\sterg> conda list numba
# packages in environment at C:\Users\sterg\anaconda3\envs\ElM2D:
#
# Name                    Version                   Build  Channel
numba                     0.55.0.dev0+139.ge050fd691          pypi_0    pypi

这是可能的(可能？)因为这个Issue上提到了一个PR。尽管如此，这些访问错误似乎仍然在发生，而不一定让您知道在哪里(就像它对我所做的那样)，并且我只能通过Cuda模拟器获得更有意义的东西。希望这能将人们从spending two weeks debugging this particular error中拯救出来。如果这样做仍然会导致nvvm级别的非描述性错误，这可能不是代码中的错误，而是需要opened on Numba GitHub的问题。有limitations to using CUDA Simulator，但我很高兴他们有这个功能！

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/62115188

复制

相似问题

问CUDA GPU的Numba代码编译失败，并显示OSError: exception: access violation reading
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问CUDA GPU的Numba代码编译失败，并显示OSError: exception: access violation readingEN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问CUDA GPU的Numba代码编译失败，并显示OSError: exception: access violation reading
EN