文章/答案/技术大牛

发布

社区首页 >问答首页 >从2个numpy数组构造2d列表时内存不足的错误

问从2个numpy数组构造2d列表时内存不足的错误
EN

Data Science用户

提问于 2018-08-30 10:40:25

回答 1查看 1.9K关注 0票数 2

我正在处理来自luna16数据集的肺CT图像，数据集有一个三维肺图像和一个来自CSV文件的标签，我有一个从3d数组25x25x25 ( 3d图像)构造2d列表的代码，以及从CSV文件创建一个标签0,1或1,0，在创建了我想要保存在numpy文件中的2d列表之后，下面是我创建2d列表并将其保存到numpy文件中的代码：

def getIDlist(csv_Dist,Data_Dist):
    # receive marked coords and ID in annotations.csv, and return the distination with coords.
    print('loading')
    data = np.loadtxt(csv_Dist, delimiter = ',', dtype = 'str')
    # delete the header file via 1:0, and receive the ID, x, y, z, r via 0:5 to a list.
    ID_coords = data[1:,0:5][0:10000] # get list of 'seriesuid' 'coordX' 'coordY' 'coordZ' 'class' (without header).
    # define the output file.
    ID_dist = []

    print('strat finding')
    process_bar = ShowProcess(len(ID_coords))

    for ID,x,y,z,label in ID_coords: 
        ID = ID +'.mhd' 
        found = 0       
        for parent, dirnames, filenames in os.walk(Data_Dist):
            for filename in filenames:# loop inside all files                 
                if ID == filename: # ID + .mhd in csv equal to filename in files
                    process_bar.show_process()
                    ID = parent + '\\' + ID# ID gets full path of the founded file
                    ID_dist.append([ID,x,y,z,label])# ID_dist gets info of founded files
                    found = 1
                    #print("found: ", found)
                    break
            if found == 1:
                break
        if found == 1:
            continue

    process_bar.close()                 
    return ID_dist 

def get3Dmatrix(ID_dist):

    print('preparing the 3d matrix')
    matrixlist = []
    for Dist, xcoords, ycoords, zcoords, label in tqdm(ID_dist):
        # read the image
        imagearray,origin,spacing = load_itk_image(Dist)
        # resample in to 1mm*1mm*1mm
        imagearray = resample(imagearray,spacing,(1,1,1))

        # transfer world coordinates to voxel-coordinates, divide new spacing 1mm
        z = int(round((float(zcoords)-float(origin[0]))/1))
        y = int(round((float(ycoords)-float(origin[1]))/1))
        x = int(round((float(xcoords)-float(origin[2]))/1))

        # get the 3D array with shape 25*25*25           
        imagearray = imagearray[z-13:z+12,y-13:y+12,x-13:x+12]

        #converting the label number into a one-hot-encoding
        if int(label) == 1: 
            label=np.array([0,1])
        elif int(label) == 0: 
            label=np.array([1,0])

        # put it into output file
        matrixlist.append([imagearray,label])# 2d list consist of 3d array + label of all cases.
    return matrixlist 

 def main():
    start_time = time.time()
    # get ID_list from the csv and data dist.
    ID_list = getIDlist(candidates_V2_Dist, Data_Dist)# nested list - get file name with dist + x,y,z,class
    # Data_set[i][0] is the 3D array, Data_set[i][1] is the label
    Data_set = get3Dmatrix(ID_list) # 2d list consist of 3d array + label of all cases.
    print("Begin saving in numpy file")
    np.save(output_path+'np_ds(10000)-25-25-25(zyx)_one_hot.npy', Data_set)
    print("%s time takes in seconds" % (time.time() - start_time))

if __name__ == "__main__":
    main()

我的问题是：

有了大约550个样本，内存就满了，我得到了内存错误，我正在用16 gb内存笔记本电脑开发戴尔inspiron核心i7。

2-创建每个样本需要34秒钟，我看到这是一个样本所需的大量时间。

我在google上做了很多搜索，在其他一些论坛上问了一个问题，但是没有得到任何答案，有人能帮我吗？我真的对那个错误感到困惑。下面的图像是错误消息：

numpy

python

回答 1

Data Science用户

发布于 2018-08-30 19:31:59

我建议在任何时候减少一些问题，以减少内存的使用。

主函数的第一部分使用getIDList获取ID。听起来不错，所以把它留在那里吧。

然后，我会将该列表分解为较小的块，依次调用每个块的get3Dmatrix。修改您的代码时，它可能如下所示：

# Get number of entries in ID list
N = len(ID_list)

# break it down into a number of chunks e.g. 4, based on your progress bar
import numpy as np    # should already be imported

N = len(ID_list)
num_chunks = 4           # you can play with this number, making it larger until you don't get emmory errors
chunks = np.linspace(0, N, num_chunks)

for i in range(len(chunks) - 1):
    this_sublist = ID_list[chunks[i] : chunks[i + 1]]
    sub_data_set = get3Dmatrix(this_sublist)

    # At this point, either save this sub_data_set, or try appending it to another list toi make one final numpy matrix at the end before saving

...

print("Begin saving in numpy file")
np.save(output_path+'np_ds(10000)-25-25-25(zyx)_one_hot.npy', Data_set)
print("%s time takes in seconds" % (time.time() - start_time))

即使从您添加的回溯中，也很难确定代码中的确切位置正在发生。

粗略地看一下您提到的维度，16 it的机器内存不足似乎也是不可能的--所以我不能完全理解保存了多少图像/补丁。

票数 1

页面原文内容由Data Science提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://datascience.stackexchange.com/questions/37604

复制

相似问题

问从2个numpy数组构造2d列表时内存不足的错误
EN

回答 1

Data Science用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问从2个numpy数组构造2d列表时内存不足的错误EN

回答 1

Data Science用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问从2个numpy数组构造2d列表时内存不足的错误
EN