我正在尝试使用现有的CNN模型教程和我自己的数据集,因为我是新手,我想了解这个概念。
我不明白的是标签y的值。如何训练模型为每个样本x赋予标签y?有没有一种比使用for lop更好的方法将我的数据加载到cnn?
我有一个由5只动物组成的数据集,总共有1725只动物。例如,dog =3个图像。cats = 54个图像。
每次我尝试运行我的模型时,我都会得到这个错误:
ValueError: Input arrays should have the same number of samples as target arrays. Found 5172 input samples and 1725 target samples.这是输入
import numpy as np
import matplotlib.pyplot as plt
import os
import cv2
import tensorflow as tf
import tensorflow.keras as keras
from keras.utils import to_categorical
IMG_SIZE = 64
PATH = os.getcwd()
data_path = PATH + '\image'
data_dir_list = os.listdir(data_path)
num_classes = 5
img_data_list = []
num_channel=1
for dataset in data_dir_list:
img_list=os.listdir(data_path+'/'+ dataset)
print ('Loaded the images of dataset-'+'{}\n'.format(dataset))
for img in img_list:
input_img=cv2.imread(data_path + '/'+ dataset + '/'+ img )
input_img=cv2.cvtColor(input_img, cv2.COLOR_BGR2GRAY)
input_img_resize=cv2.resize(input_img,(64,64))
img_data_list.append(input_img_resize)
img_data = np.array(img_data_list)
img_data = img_data.astype('float32')
img_data /= 255
print(img_data.size)
print (img_data.shape)在我运行之后
Loaded the images of dataset-0
Loaded the images of dataset-1
Loaded the images of dataset-2
Loaded the images of dataset-3
Loaded the images of dataset-4
7065600
(1725, 64, 64) 如果我检查img_data列表
print(len(img_data))
print(img_data)
1725
[[[0.8784314 0.32941177 0.22745098 ... 0.13333334 0.06666667 0.05490196]
[0.03137255 0.16862746 0.14901961 ... 0.18431373 0.20784314 0.16470589]
[0.1764706 0.42745098 0.26666668 ... 0.42352942 0.05882353 0.00784314]
...
[0.42352942 0.4 0.2901961 ... 0.3647059 0.4392157 0.4392157 ]
[0.38431373 0.4 0.4392157 ... 0.4392157 0.3019608 0.32941177]
[0.20784314 0.41568628 0.40392157 ... 0.42745098 0.21176471 0.3372549 ]]这里是在重塑img_data之后
#num_of_samples = 1725
from sklearn.model_selection import train_test_split
import random
from sklearn.utils import shuffle
num_of_samples = img_data.shape[0]
# convert class labels to on-hot encoding
#Y = np_utils.to_categorical(labels, 5)
#train_labels = keras.utils.to_categorical(labels, num_classes)
nb_train_samples = 1725
train_labels = np.array([0] * (195) + [1] * (120) + [2] * (380) + [3] * (144) + [4] * (886))
train_labels = keras.utils.to_categorical(train_labels, num_classes = 5)
#Shuffle the dataset
x = [] # for images
y = [] # for labels
print(train_labels.shape)
x = shuffle(img_data)
print(x.shape)
y = shuffle(train_labels)
print(y.shape)
# Split the dataset
y_train = y.reshape((1725,5))
X_train,X_test, y_train, y_test = train_test_split(X, y)
X_train = np.array(X_train).reshape(-1, 32,32, 1)
print (X_train.size)
X_test = np.array(X_test).reshape(-1, 32,32, 1)
print (y_train.size)
print (train_labels)
print (train_labels.size)输出
(1725, 5)
(1725, 64, 64)
(1725, 5)ValueError:发现样本数量不一致的输入变量: 5175,1725
#Set Model Parameters
batch_size = 15
epochs = 50
num_classes = 5
input_shape=img_data[0].shape
#Build Model
Model = Sequential()
Model.add(Conv2D(32, kernel_size=(3,3), input_shape=(32,32,1)))
Model.add(Activation('relu'))
Model.add(MaxPooling2D(pool_size=(2, 2)))
Model.add(Conv2D(64, (3, 3)))
Model.add(Activation('relu'))
Model.add(Flatten())
Model.add(Dense(1024))
Model.add(Activation('relu'))
Model.add(Dropout(0.4))
Model.add(Dense(num_classes))
Model.add(Activation('softmax'))
# Model Compiling
Model.compile(loss = "categorical_crossentropy", optimizer = "Adam", metrics=['accuracy'])发布于 2019-02-28 02:44:35
关于5175,1725 *3= 5175,所以一些numpy重塑操作是将RGB颜色通道(3)和样本计数(1725)组合在一起?但是,在上面的代码片段中不是这种情况,因为图像被转换为灰度。
是的,有一种更简单的方法可以在Keras中加载图像数据。只需将每个类的图像放在其自己的子文件夹下,如狗、猫等,并使用ImageDataGenerator的flow_from_directory方法。另一个优点是它还支持数据增强、归一化等。
https://stackoverflow.com/questions/54909990
复制相似问题