我有一张脑转移磁共振成像的清单,我想用来训练和测试。这些图像都是相似的,但原始肿瘤部位不同。请参见以下示例:
来自肺:
来自乳房:
来自皮肤:
来自肺组织:
来自骨髓:
我希望测试和验证集包含相同数量的图像,而不会丢失类似的组合(这两个列表都包含每个子类型的相同数量)。
为此目的,我可以为每个子类型创建列表,然后随机拆分这50/50。然后把所有这些列表加在一起?
发布于 2021-09-12 22:53:11
如果您想从满足特定条件的熊猫DataFrame中获取特定行,可以进行筛选。在你的情况下,就像:
reader_lung = reader[reader["Image_Title"] == "Lung"]"Image_Title",您需要更改为您要查找的关键字(例如,肺)的列的名称。这必须是完全匹配的。
对于不需要精确匹配的内容,还可以执行以下操作:
reader_lung = reader[reader["Image_Title"].str.contains("Lung")]发布于 2021-09-12 23:05:28
你能否创建一个列表(每种类型一个),然后取第一个N,并将它们进行训练和最后一个N,并将它们进行测试?
就像这个伪码:
with open(r"B:/.../excell.csv", newline='') as f:
reader = csv.reader(f, dialect="excel",delimiter=';')
test = []
training = []
type_map = {}
for row in reader:
if row[33] in type_map:
# If the type has already been viewed, append to the existing list of those images
type_map[row[33]].append(row)
else:
# If this type is seen for the first time, create a new array with that row in it
type_map[row[33]] = [row]
# Now you should have a map like : {"Lung": ["image1", "image2" ...], "Heart": ["imageA"....]}
for image_type in type_map:
type_images = type_map[image_type]
half_way_index = len(type_images)/2 # For odd elements i.e 13 elems this will give you 6 (integer division)
test += type_images[0:half_way_index] # First half of the type_images are test
training += type_images[half_way_index:(half_way_index*2)] # Second half are traininghttps://stackoverflow.com/questions/69149736
复制相似问题