我是Python和张量流的新手,我想知道...
如何最好地将多层Tiffs的标记数据集转换为张量流可用于模型优化/微调的格式?
我目前有这段代码,它将多Tiffs文件夹的每一层放到一个3D数组中,但我需要保留多Tiffs的标签或文件名。我见过一些要转换为TFRecords的张量流程脚本,但是,我不确定这些脚本是否保留了文件名?你怎么做才是最好的呢?这将是一个相当大的数据集。
任何帮助都非常感谢
import os # For file handling
from PIL import Image# Import Pillow image processing library
import numpy
CroppedMultiTiffs = "MultiTiffs/"
for filename in os.listdir(MultiTiffs):
## Imports Multi-Layer TIFF into 3D Numpy Array.
img = Image.open(MultiTiffs + filename)
imgArray = numpy.zeros( ( img.n_frames, img.size[1], img.size[0] ),numpy.uint8 )
try:
# for frames in range, img.n_frames for whole folder.
for frame in range(2,img.n_frames):
img.seek( frame )
imgArray[frame,:,:] = img
frame = frame + 1
except (EOFError): img.seek( 0 )
# output error if it doesn't find a file.
pass
print(imgArray.shape) # imgArray is now 3D
print(imgArray.size)谨致问候
TWP
发布于 2017-07-18 20:47:04
好的,所以我使用了Daniils博客http://warmspringwinds.github.io/tensorflow/tf-slim/2016/12/21/tfrecords-guide/中的帖子
然而,我目前的实现创建了多个TFRecords,并且我认为它需要是单个TFRecord,所以尝试弄清楚如何使其成为单个TFRecord。我该怎么做?
然后,我可以使用TFRecord读取脚本来验证它,以重新读取它,并检查它是否为张量流的正确格式。我目前在使用阅读脚本时遇到错误。
from PIL import Image
import numpy as np
import tensorflow as tf
import os
def _bytes_feature(value):
return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))
def _int64_feature(value):
return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))
path = 'test/'
output = 'output/'
fileList = [os.path.join(dirpath, f) for dirpath, dirnames, files in os.walk(path) for f in files if f.endswith('.tif')]
print (fileList)
for filename in fileList:
basename = os.path.basename(filename)
file_name = basename[:-4]
print ("processing file: " , filename)
print (file_name)
if not os.path.exists(output):
os.mkdir(output)
writer = tf.python_io.TFRecordWriter(output+ file_name + '.tfrecord')
img = Image.open(filename)
imgArray = np.zeros( ( img.n_frames, img.size[1], img.size[0] ),np.uint8 )
## Imports Multi-Layer file into 3D Numpy Array.
try:
for frame in range(0,img.n_frames):
img.seek( frame )
imgArray[frame,:,:] = img
frame = frame + 1
except (EOFError): img.seek( 0 )
pass
print ("print img size:" , img.size)
print ("print image shape: " , imgArray.shape)
print ("print image size: " , imgArray.size)
annotation = np.array(Image.open(filename))
height = imgArray.shape[0]
width = imgArray.shape[1]
depth = imgArray.shape[2]
img_raw = imgArray.tostring()
annotation_raw = annotation.tostring()
example = tf.train.Example(features=tf.train.Features(feature={
'height': _int64_feature(height),
'width': _int64_feature(width),
'depth': _int64_feature(depth), # for 3rd dimension
'image_raw': _bytes_feature(img_raw),
'mask_raw': _bytes_feature(annotation_raw)}))
writer.write(example.SerializeToString())我当前的TFRecords阅读脚本
import tensorflow as tf
import os
def read_and_decode(filename_queue):
reader = tf.TFRecordReader()
_, serialized_example = reader.read(filename_queue)
features = tf.parse_single_example(
serialized_example,
# Defaults are not specified since both keys are required.
features={
'image_raw': tf.FixedLenFeature([], tf.string),
'label': tf.FixedLenFeature([], tf.int64),
'height': tf.FixedLenFeature([], tf.int64),
'width': tf.FixedLenFeature([], tf.int64),
'depth': tf.FixedLenFeature([], tf.int64)
})
image = tf.decode_raw(features['image_raw'], tf.uint8)
label = tf.cast(features['label'], tf.int32)
height = tf.cast(features['height'], tf.int32)
width = tf.cast(features['width'], tf.int32)
depth = tf.cast(features['depth'], tf.int32)
return image, label, height, width, depth
with tf.Session() as sess:
filename_queue = tf.train.string_input_producer(["output/A.3.1.tfrecord"])
image, label, height, width, depth = read_and_decode(filename_queue)
image = tf.reshape(image, tf.stack([height, width, 3]))
image.set_shape([32,32,3])
init_op = tf.initialize_all_variables()
sess.run(init_op)
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
for i in range(1000):
example, l = sess.run([image, label])
print (example,l)
coord.request_stop()
coord.join(threads)收到错误:-
标签(参见上面的回溯):InvalidArgumentError:,Feature: label (数据类型: int64)是必需的,但找不到。
图像是多页灰度图像
https://stackoverflow.com/questions/45149045
复制相似问题