我的任务是在文档的图片上找到某个字母。使用经典的计算机视觉,我将图像分割成字符。然后,我使用神经网络对25×25像素的字符图像进行训练,将它们分类为我想要的和所有其他的。使用这个,我可以重建这些字符的位置。
现在我想将convnet直接应用于整个图像,这样我就不必依赖经典的分割。该网络是一个深度神经网络,由2D卷积、2D最大池化层和密集分类器组成。网络如下所示:
Layer (type) Output Shape Param #
=================================================================
conv2d_61 (Conv2D) (None, 23, 23, 32) 320
_________________________________________________________________
max_pooling2d_50 (MaxPooling (None, 11, 11, 32) 0
_________________________________________________________________
conv2d_62 (Conv2D) (None, 9, 9, 64) 18496
_________________________________________________________________
max_pooling2d_51 (MaxPooling (None, 4, 4, 64) 0
_________________________________________________________________
flatten_46 (Flatten) (None, 1024) 0
_________________________________________________________________
dropout_5 (Dropout) (None, 1024) 0
_________________________________________________________________
dense_89 (Dense) (None, 1) 1025
=================================================================
Total params: 19,841
Trainable params: 19,841
Non-trainable params: 0我知道我可以使用经过训练的滤波器将卷积部分应用于整个图像。这将以具有更大空间维度的张量的形式给出对这些滤波器的响应。但为了进行分类,我需要使用经过固定数量的空间信息训练的分类器。提供不同大小的图像将打破这一点。
到目前为止,我最好的想法是将图像切片为瓦片,并将每个固定大小的瓦片送入分类器。这似乎是another question的答案。
是否存在更好的方法,将训练好的滤波器应用于整个图像,并使用训练好的分类器进行某种局部分类?
发布于 2020-09-16 23:32:46
作为一种解决方案,我建议您使用tf.image.extract_patches函数从图像中提取补丁,并将训练好的分类器应用于每个补丁。这有几个好处:
以下是解决方案的草图:
import tensorflow as tf
from tensorflow.keras.layers import Input, Reshape, TimeDistributed
whole_images = Input(shape=(img_rows, img_cols, 1))
patches = tf.image.extract_patches(
whole_images,
sizes=[1, 25, 25, 1],
strides=[1, 1, 1, 1], # you can choose to increase the stride if you don't want a dense classification map
rates=[1, 1, 1, 1],
padding='SAME'
)
# The `patches` would have a shape of `(batch_size, num_row_locs, num_col_locs, 25*25)`.
# So we reshape it so that we can apply the classifier to each patch independently.
reshaped_patches = Reshape((-1, 25, 25, 1))(patches)
dense_map = TimeDistributed(letter_classifier)(reshaped_patches)
# Reshape it back
dense_map = Reshape(tf.shape(patches)[1:-1])(dense_map)
# Construct the model
image_classifier = Model(whole_images, dense_map)
# Use it on the real images
output = image_classifier(my_images)https://stackoverflow.com/questions/63922156
复制相似问题