这 Tensorflow教程将已经存在的数据集(MNIST)加载到代码中。相反,我想插入我自己的培训和测试图像。
def main(unused_argv):
# Load training and eval data
mnist = tf.contrib.learn.datasets.load_dataset("mnist")
train_data = mnist.train.images # Returns np.array
train_labels = np.asarray(mnist.train.labels, dtype=np.int32)
eval_data = mnist.test.images # Returns np.array
eval_labels = np.asarray(mnist.test.labels, dtype=np.int32)它说它返回一个np数组的原始像素值。
我的问题是:
1.如何为自己的映像集创建这样一个numpy数组?我想要这样做,这样我就可以在示例代码中直接替换我的numpy数组,并根据我的数据(0-9和A-Z)来训练模型。
编辑:在进一步的分析中,我意识到mnist.train.images和mnist.test.images中的像素值已经在0到1之间标准化,从0到255 (我想)这种规范化有什么帮助呢?
文件夹结构:培训和测试文件夹位于同一个文件夹中。
Training folder:
--> 0
-->Image_Of_0.png
--> 1
-->Image_Of_1.png
.
.
.
--> Z
-->Image_Of_Z.png
Testing folder:
--> 0
-->Image_Of_0.png
--> 1
-->Image_Of_1.png
.
.
.
--> Z
-->Image_Of_Z.png我写的代码:
Names = [['C:\\Users\\xx\\Project\\training-images', 'train',9490], ['C:\\Users\\xx\\Project\\test-images', 'test',3175]]
#9490 is the number of training files in total (All the PNGs)
#3175 is the number of testing files in total (All the PNGs)
for name in Names:
FileList = []
for dirname in os.listdir(name[0]):
path = os.path.join(name[0], dirname)
for filename in os.listdir(path):
if filename.endswith(".png"):
FileList.append(os.path.join(name[0], dirname, filename))
print(FileList)
## Creates list of all PNG files in training and testing folder
x_data = np.array([np.array(cv2.imread(filename)) for filename in FileList])
pixels = x_data.flatten().reshape(name[2], 2352) #2352 = 28 * 28 * 3 image
print(pixels)所创建的像素数组能否作为培训和测试数据提供,即它是否具有与示例代码中提供的数据相同的格式?
2.类似地,必须为所有标签创建什么numpy数组?(文件夹名)
发布于 2018-05-28 06:51:05
1.如何为自己的图像集创建这样一个数字数组?
TensorFlow以多种方式接受数据(tf.data、feed_dict、QueueRunner)。您应该使用的是TFRecord,它可以通过tf.data API访问。它也是推荐格式。假设您有包含图像的文件夹,并且希望将其转换为tfrecord文件。
import tensorflow as tf
import numpy as np
import glob
from PIL import Image
# Converting the values into features
# _int64 is used for numeric values
def _int64_feature(value):
return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))
# _bytes is used for string/char values
def _bytes_feature(value):
return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))
tfrecord_filename = 'something.tfrecords'
# Initiating the writer and creating the tfrecords file.
writer = tf.python_io.TFRecordWriter(tfrecord_filename)
# Loading the location of all files - image dataset
# Considering our image dataset has apple or orange
# The images are named as apple01.jpg, apple02.jpg .. , orange01.jpg .. etc.
images = glob.glob('data/*.jpg')
for image in images[:1]:
img = Image.open(image)
img = np.array(img.resize((32,32)))
label = 0 if 'apple' in image else 1
feature = { 'label': _int64_feature(label),
'image': _bytes_feature(img.tostring()) }
#create an example protocol buffer
example = tf.train.Example(features=tf.train.Features(feature=feature))
#writing the serialized example.
writer.write(example.SerializeToString())
writer.close() 现在要读取这个tfrecord文件并做一些事情
import tensorflow as tf
import glob
reader = tf.TFRecordReader()
filenames = glob.glob('*.tfrecords')
filename_queue = tf.train.string_input_producer(
filenames)
_, serialized_example = reader.read(filename_queue)
feature_set = { 'image': tf.FixedLenFeature([], tf.string),
'label': tf.FixedLenFeature([], tf.int64)
}
features = tf.parse_single_example( serialized_example, features= feature_set )
label = features['label']
with tf.Session() as sess:
print sess.run([image,label]) 下面是tensorflow/示例中MNIST的一个示例
干杯!
https://stackoverflow.com/questions/50547249
复制相似问题