文章/答案/技术大牛

发布

社区首页 >问答首页 >Scikit-学习:从文件夹中加载图像，为KNN分类创建一个标记数据集

问Scikit-学习:从文件夹中加载图像，为KNN分类创建一个标记数据集
EN

Stack Overflow用户

提问于 2019-07-02 08:25:16

回答 3查看 4.4K关注 0票数 2

我想做手写体数字识别使用K最近的近邻分类与科学学习。我有一个文件夹，里面有5001张手写数字的图像(从0到9的每一个数字有500张图像)。

我试图找到一种方法来创建一个基于这些图像的数据集，这样我就可以创建一个培训和测试集。我读过很多关于如何使用scikit进行K-最近邻分类的在线教程，但是大多数教程都加载了现有的数据集，如手写数字的MNIST数据集。

是否有任何方法通过从文件夹中读取图像，然后为每个图像分配一个标签来创建自己的数据集？我不知道我能用什么方法来做这件事。任何见解都会受到赞赏。

scikit-learn

回答 3

Stack Overflow用户

回答已采纳

发布于 2019-07-02 08:50:33

要读取数据，您应该执行如下操作：

from os import listdir
from os.path import isfile, join
import re
import matplotlib.pyplot as plt

mypath = '.' # edit with the path to your data
files = [f for f in listdir(mypath) if isfile(join(mypath, f))]

x = []
y = []

for file in files:
    label = file.split('_')[0] # assuming your img is named like this "eight_1.png" you want to get the label "eight"
    y.append(label)
    img = plt.imread(file)
    x.append(img)

然后，您将需要操作一点点x和y，然后再把它交给scikit学习，但是你应该会没事的。

票数 1

Stack Overflow用户

发布于 2019-07-02 08:33:05

您可以使用枕头或opencv库来读取您的图像。

枕头：

from PIL import Image 
import numpy as np

img = PIL.Image.open("image_location/image_name") # This returns an image object   
img = np.asarray(img) # convert it to ndarray

对于Opencv：

import cv2

img = cv2.imread("image_location/image_name", cv2.IMREAD_GRAYSCALE)

要转换您可以使用的所有映像，例如os库：

import os

创建图像名称列表

loc = os.listdir('your_images_folder')

要用一个颜色通道存储灰度图像，可以使用空数组

data = np.ones((# of images, image_size wxh))


  for i, l in enumerate(loc):

     # Full image path
     path = os.path.join("your_images_folder", l)

     img = np.asarray(PIL.Image.open(path))

     # Make a vector from an image
     img = img.reshape(-1, img.size)

     # store this vector
     data[i,:]  = img

因此，wou将为您的分类项目获取numpy数组“数据”。"y“向量也可以从每个图像的名称中添加到相同的循环中。

要使用循环中的进度条跟踪进程，有时tqdm库可能是一个适当的解决方案。要存储rgb图像，可以实现相同的解决方案。对于rgb图像，img.reshape(-1, )将返回较长的向量。

票数 1

Stack Overflow用户

发布于 2019-07-02 08:54:28

这个有用吗？

import os
import imageio


def convert_word_to_label(word):

    if word == 'zero':
        return 0
    elif word == 'one':
        return 1
    elif word == 'two':
        return 2
    elif word == 'three':
        return 3
    elif word == 'four':
        return 4
    elif word == 'five':
        return 5
    elif word == 'six':
        return 6
    elif word == 'seven':
        return 7
    elif word == 'eight':
        return 8
    elif word == 'nine':
        return 9



def create_dataset(path):
    X = []
    y = []

    for r, d, f in os.walk(path):
        for image in f:
            if '.jpg' in image:
                image_path = os.path.join(r, image)
                img = imageio.imread(image_path)
                X.append(img)
                word = image.split('_')[0]
                y.append(convert_word_to_label(word))
    return X, y

if __name__ == '__main__':
    X, y = create_dataset('path/to/image_folder/')

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/56848253

复制

相似问题

问Scikit-学习:从文件夹中加载图像，为KNN分类创建一个标记数据集
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Scikit-学习:从文件夹中加载图像，为KNN分类创建一个标记数据集EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Scikit-学习:从文件夹中加载图像，为KNN分类创建一个标记数据集
EN