我有一个图像数据集,如
|-Train
| |-Defective
| |-images
| |-Not_Defective
| |-images我使用以下函数对这些图像进行了预处理
dir='../input/railwaytrackv4/Dataset _ Railway Track Fault Detection-20210713T183411Z-001/Dataset _ Railway Track Fault Detection/Train'
train_data=tf.keras.utils.image_dataset_from_directory(directory=dir,
labels='inferred',
batch_size=32,
image_size=(256, 256))它的输出为Found 1469 files belonging to 2 classes.
和type(train_data) = tensorflow.python.data.ops.dataset_ops.BatchDataset
如何将此train_data转换为numpy数组?
更新:
我试过了
for x, y in train_data:
x = x.numpy()
y = y.numpy()但是它给出了以下输出
2021-11-01 08:48:15.079479: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)
2021-11-01 08:48:25.085070: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:175] Filling up shuffle buffer (this may take a while): 250 of 11760
2021-11-01 08:48:35.132351: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:175] Filling up shuffle buffer (this may take a while): 558 of 11760
2021-11-01 08:48:45.122079: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:175] Filling up shuffle buffer (this may take a while): 843 of 11760
2021-11-01 08:48:55.135867: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:175] Filling up shuffle buffer (this may take a while): 1160 of 11760
2021-11-01 08:49:05.080678: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:175] Filling up shuffle buffer (this may take a while): 1455 of 11760
2021-11-01 08:49:05.657894: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:228] Shuffle buffer filled.
2021-11-01 08:49:05.665031: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 1155268608 exceeds 10% of free system memory.注意:找到属于2个类的1469个文件。
发布于 2021-10-31 15:58:10
tf.keras.utils.image_dataset_from_directory返回一个tf.data.Dataset,它是一个奇特的生成器,它会像您在python中预期的那样生成值,唯一的区别是它会生成tensorflow Tensor对象,因此您只需要使用numpy()方法将它们转换为numpy对象
x, y = next(train_data)
x = x.numpy()
y = y.numpy()for x, y in train_data:
x = x.numpy()
y = y.numpy()编辑:
数据集是批处理的,这意味着您将始终以批处理方式读取文件。在定义tf.keras.utils.image_dataset_from_directory数据集时,您可能指定了参数batch_size=29。如果您想一次读取整个数据集,您可以使用batch_size=735,但请注意,tensorflow的Dataset旨在用作驱动器的生成器。如果您可以将数据集存储在内存中,那么您最好自己读取文件,例如使用tf.keras.utils.load_img。
https://stackoverflow.com/questions/69776776
复制相似问题