首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >从2个numpy数组构造2d列表时内存不足的错误

从2个numpy数组构造2d列表时内存不足的错误
EN

Data Science用户
提问于 2018-08-30 10:40:25
回答 1查看 1.9K关注 0票数 2

我正在处理来自luna16数据集的肺CT图像,数据集有一个三维肺图像和一个来自CSV文件的标签,我有一个从3d数组25x25x25 ( 3d图像)构造2d列表的代码,以及从CSV文件创建一个标签0,1或1,0,在创建了我想要保存在numpy文件中的2d列表之后,下面是我创建2d列表并将其保存到numpy文件中的代码:

代码语言:javascript
复制
def getIDlist(csv_Dist,Data_Dist):
    # receive marked coords and ID in annotations.csv, and return the distination with coords.
    print('loading')
    data = np.loadtxt(csv_Dist, delimiter = ',', dtype = 'str')
    # delete the header file via 1:0, and receive the ID, x, y, z, r via 0:5 to a list.
    ID_coords = data[1:,0:5][0:10000] # get list of 'seriesuid' 'coordX' 'coordY' 'coordZ' 'class' (without header).
    # define the output file.
    ID_dist = []

    print('strat finding')
    process_bar = ShowProcess(len(ID_coords))

    for ID,x,y,z,label in ID_coords: 
        ID = ID +'.mhd' 
        found = 0       
        for parent, dirnames, filenames in os.walk(Data_Dist):
            for filename in filenames:# loop inside all files                 
                if ID == filename: # ID + .mhd in csv equal to filename in files
                    process_bar.show_process()
                    ID = parent + '\\' + ID# ID gets full path of the founded file
                    ID_dist.append([ID,x,y,z,label])# ID_dist gets info of founded files
                    found = 1
                    #print("found: ", found)
                    break
            if found == 1:
                break
        if found == 1:
            continue

    process_bar.close()                 
    return ID_dist 

def get3Dmatrix(ID_dist):

    print('preparing the 3d matrix')
    matrixlist = []
    for Dist, xcoords, ycoords, zcoords, label in tqdm(ID_dist):
        # read the image
        imagearray,origin,spacing = load_itk_image(Dist)
        # resample in to 1mm*1mm*1mm
        imagearray = resample(imagearray,spacing,(1,1,1))

        # transfer world coordinates to voxel-coordinates, divide new spacing 1mm
        z = int(round((float(zcoords)-float(origin[0]))/1))
        y = int(round((float(ycoords)-float(origin[1]))/1))
        x = int(round((float(xcoords)-float(origin[2]))/1))

        # get the 3D array with shape 25*25*25           
        imagearray = imagearray[z-13:z+12,y-13:y+12,x-13:x+12]

        #converting the label number into a one-hot-encoding
        if int(label) == 1: 
            label=np.array([0,1])
        elif int(label) == 0: 
            label=np.array([1,0])

        # put it into output file
        matrixlist.append([imagearray,label])# 2d list consist of 3d array + label of all cases.
    return matrixlist 

 def main():
    start_time = time.time()
    # get ID_list from the csv and data dist.
    ID_list = getIDlist(candidates_V2_Dist, Data_Dist)# nested list - get file name with dist + x,y,z,class
    # Data_set[i][0] is the 3D array, Data_set[i][1] is the label
    Data_set = get3Dmatrix(ID_list) # 2d list consist of 3d array + label of all cases.
    print("Begin saving in numpy file")
    np.save(output_path+'np_ds(10000)-25-25-25(zyx)_one_hot.npy', Data_set)
    print("%s time takes in seconds" % (time.time() - start_time))

if __name__ == "__main__":
    main()

我的问题是:

有了大约550个样本,内存就满了,我得到了内存错误,我正在用16 gb内存笔记本电脑开发戴尔inspiron核心i7。

2-创建每个样本需要34秒钟,我看到这是一个样本所需的大量时间。

我在google上做了很多搜索,在其他一些论坛上问了一个问题,但是没有得到任何答案,有人能帮我吗?我真的对那个错误感到困惑。下面的图像是错误消息:

EN

回答 1

Data Science用户

发布于 2018-08-30 19:31:59

我建议在任何时候减少一些问题,以减少内存的使用。

主函数的第一部分使用getIDList获取ID。听起来不错,所以把它留在那里吧。

然后,我会将该列表分解为较小的块,依次调用每个块的get3Dmatrix。修改您的代码时,它可能如下所示:

代码语言:javascript
复制
# Get number of entries in ID list
N = len(ID_list)

# break it down into a number of chunks e.g. 4, based on your progress bar
import numpy as np    # should already be imported

N = len(ID_list)
num_chunks = 4           # you can play with this number, making it larger until you don't get emmory errors
chunks = np.linspace(0, N, num_chunks)

for i in range(len(chunks) - 1):
    this_sublist = ID_list[chunks[i] : chunks[i + 1]]
    sub_data_set = get3Dmatrix(this_sublist)

    # At this point, either save this sub_data_set, or try appending it to another list toi make one final numpy matrix at the end before saving

...

print("Begin saving in numpy file")
np.save(output_path+'np_ds(10000)-25-25-25(zyx)_one_hot.npy', Data_set)
print("%s time takes in seconds" % (time.time() - start_time))

即使从您添加的回溯中,也很难确定代码中的确切位置正在发生。

粗略地看一下您提到的维度,16 it的机器内存不足似乎也是不可能的--所以我不能完全理解保存了多少图像/补丁。

票数 1
EN
页面原文内容由Data Science提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://datascience.stackexchange.com/questions/37604

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档