我正在处理来自luna16数据集的肺CT图像,数据集有一个三维肺图像和一个来自CSV文件的标签,我有一个从3d数组25x25x25 ( 3d图像)构造2d列表的代码,以及从CSV文件创建一个标签0,1或1,0,在创建了我想要保存在numpy文件中的2d列表之后,下面是我创建2d列表并将其保存到numpy文件中的代码:
def getIDlist(csv_Dist,Data_Dist):
# receive marked coords and ID in annotations.csv, and return the distination with coords.
print('loading')
data = np.loadtxt(csv_Dist, delimiter = ',', dtype = 'str')
# delete the header file via 1:0, and receive the ID, x, y, z, r via 0:5 to a list.
ID_coords = data[1:,0:5][0:10000] # get list of 'seriesuid' 'coordX' 'coordY' 'coordZ' 'class' (without header).
# define the output file.
ID_dist = []
print('strat finding')
process_bar = ShowProcess(len(ID_coords))
for ID,x,y,z,label in ID_coords:
ID = ID +'.mhd'
found = 0
for parent, dirnames, filenames in os.walk(Data_Dist):
for filename in filenames:# loop inside all files
if ID == filename: # ID + .mhd in csv equal to filename in files
process_bar.show_process()
ID = parent + '\\' + ID# ID gets full path of the founded file
ID_dist.append([ID,x,y,z,label])# ID_dist gets info of founded files
found = 1
#print("found: ", found)
break
if found == 1:
break
if found == 1:
continue
process_bar.close()
return ID_dist
def get3Dmatrix(ID_dist):
print('preparing the 3d matrix')
matrixlist = []
for Dist, xcoords, ycoords, zcoords, label in tqdm(ID_dist):
# read the image
imagearray,origin,spacing = load_itk_image(Dist)
# resample in to 1mm*1mm*1mm
imagearray = resample(imagearray,spacing,(1,1,1))
# transfer world coordinates to voxel-coordinates, divide new spacing 1mm
z = int(round((float(zcoords)-float(origin[0]))/1))
y = int(round((float(ycoords)-float(origin[1]))/1))
x = int(round((float(xcoords)-float(origin[2]))/1))
# get the 3D array with shape 25*25*25
imagearray = imagearray[z-13:z+12,y-13:y+12,x-13:x+12]
#converting the label number into a one-hot-encoding
if int(label) == 1:
label=np.array([0,1])
elif int(label) == 0:
label=np.array([1,0])
# put it into output file
matrixlist.append([imagearray,label])# 2d list consist of 3d array + label of all cases.
return matrixlist
def main():
start_time = time.time()
# get ID_list from the csv and data dist.
ID_list = getIDlist(candidates_V2_Dist, Data_Dist)# nested list - get file name with dist + x,y,z,class
# Data_set[i][0] is the 3D array, Data_set[i][1] is the label
Data_set = get3Dmatrix(ID_list) # 2d list consist of 3d array + label of all cases.
print("Begin saving in numpy file")
np.save(output_path+'np_ds(10000)-25-25-25(zyx)_one_hot.npy', Data_set)
print("%s time takes in seconds" % (time.time() - start_time))
if __name__ == "__main__":
main()我的问题是:
有了大约550个样本,内存就满了,我得到了内存错误,我正在用16 gb内存笔记本电脑开发戴尔inspiron核心i7。
2-创建每个样本需要34秒钟,我看到这是一个样本所需的大量时间。
我在google上做了很多搜索,在其他一些论坛上问了一个问题,但是没有得到任何答案,有人能帮我吗?我真的对那个错误感到困惑。下面的图像是错误消息:

发布于 2018-08-30 19:31:59
我建议在任何时候减少一些问题,以减少内存的使用。
主函数的第一部分使用getIDList获取ID。听起来不错,所以把它留在那里吧。
然后,我会将该列表分解为较小的块,依次调用每个块的get3Dmatrix。修改您的代码时,它可能如下所示:
# Get number of entries in ID list
N = len(ID_list)
# break it down into a number of chunks e.g. 4, based on your progress bar
import numpy as np # should already be imported
N = len(ID_list)
num_chunks = 4 # you can play with this number, making it larger until you don't get emmory errors
chunks = np.linspace(0, N, num_chunks)
for i in range(len(chunks) - 1):
this_sublist = ID_list[chunks[i] : chunks[i + 1]]
sub_data_set = get3Dmatrix(this_sublist)
# At this point, either save this sub_data_set, or try appending it to another list toi make one final numpy matrix at the end before saving
...
print("Begin saving in numpy file")
np.save(output_path+'np_ds(10000)-25-25-25(zyx)_one_hot.npy', Data_set)
print("%s time takes in seconds" % (time.time() - start_time))即使从您添加的回溯中,也很难确定代码中的确切位置正在发生。
粗略地看一下您提到的维度,16 it的机器内存不足似乎也是不可能的--所以我不能完全理解保存了多少图像/补丁。
https://datascience.stackexchange.com/questions/37604
复制相似问题