首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Pytorch MNIST自动编码器学习10位数字分类

Pytorch MNIST自动编码器学习10位数字分类
EN

Stack Overflow用户
提问于 2021-03-17 14:22:53
回答 2查看 626关注 0票数 3

我正在尝试为MNIST构建一个简单的自动编码器,中间层只有10个神经元。我希望它能学会对10位数字进行分类,我认为这最终会导致最低的误差(wrt重现原始图像)。

我有下面的代码,我已经用了相当多的代码。如果我运行它直到100个时期,损失不会真正低于1.0,如果我评估它,它显然不起作用。我遗漏了什么?

培训:

代码语言:javascript
复制
import torch
import torchvision as tv
import torchvision.transforms as transforms
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
from torchvision.utils import save_image

num_epochs = 100
batch_size = 64

transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))
])
trainset = tv.datasets.MNIST(root='./data',  train=True, download=True, transform=transform)
dataloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size, shuffle=True, num_workers=4)

class Autoencoder(nn.Module):
    def __init__(self):
        super(Autoencoder,self).__init__()
        self.encoder = nn.Sequential(
            # 28 x 28
            nn.Conv2d(1, 4, kernel_size=5),
            nn.Dropout2d(p=0.2),
            # 4 x 24 x 24
            nn.ReLU(True),
            nn.Conv2d(4, 8, kernel_size=5),
            nn.Dropout2d(p=0.2),
            # 8 x 20 x 20 = 3200
            nn.ReLU(True),
            nn.Flatten(),
            nn.Linear(3200, 10),
            nn.ReLU(True),
            # 10
            nn.Softmax(),
            # 10
            )
        self.decoder = nn.Sequential(
            # 10
            nn.Linear(10, 400),
            nn.ReLU(True),
            # 400
            nn.Unflatten(1, (1, 20, 20)),
            # 20 x 20
            nn.Dropout2d(p=0.2),
            nn.ConvTranspose2d(1, 10, kernel_size=5),
            # 24 x 24
            nn.ReLU(True),
            nn.Dropout2d(p=0.2),
            nn.ConvTranspose2d(10, 1, kernel_size=5),
            # 28 x 28
            nn.ReLU(True),
            nn.Sigmoid(),
            )
    def forward(self, x):
        x = self.encoder(x)
        x = self.decoder(x)
        return x

model = Autoencoder().cpu()
distance = nn.MSELoss()
#optimizer = torch.optim.Adam(model.parameters(), weight_decay=1e-5)
optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.5)

for epoch in range(num_epochs):
    for data in dataloader:
        img, _ = data
        img = Variable(img).cpu()
        output = model(img)
        loss = distance(output, img)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    print('epoch [{}/{}], loss: {:.4f}'.format(epoch+1, num_epochs, loss.item()))

训练损失已经表明事情没有工作,但打印出混淆矩阵(在这种情况下不一定是单位矩阵,因为神经元可以任意排序,但应该是行可重新排序的并近似单位,如果这样可以工作的话):

代码语言:javascript
复制
import numpy as np

confusion_matrix = np.zeros((10, 10))

batch_size = 20*1000

testset = tv.datasets.MNIST(root='./data',  train=False, download=True, transform=transform)
dataloader = torch.utils.data.DataLoader(testset, batch_size=batch_size, shuffle=True, num_workers=4)

for data in dataloader:
    imgs, labels = data
    imgs = Variable(imgs).cpu()
    encs = model.encoder(imgs).detach().numpy()
    for i in range(len(encs)):
        predicted = np.argmax(encs[i])
        actual = labels[i]
        confusion_matrix[actual][predicted] += 1
print(confusion_matrix)
EN

回答 2

Stack Overflow用户

发布于 2021-03-18 02:55:40

我能够把你的代码带到一个至少会收敛的版本。总而言之,我认为它可能存在多个问题:归一化(为什么是那些值?),一些不必要的relus,太高的学习率,MSE损失而不是交叉熵,主要是我不认为瓶颈层的softmax以这种方式工作,因为梯度消失的原因,请看这里

https://www.quora.com/Does-anyone-ever-use-a-softmax-layer-mid-neural-network-rather-than-at-the-end

也许可以用Gumbel softmax解决这个问题:https://arxiv.org/abs/1611.01144

此外,已经有论文实现了这一点,但作为变分自动编码器而不是普通自动编码器,请参阅此处:https://arxiv.org/abs/1609.02200

现在你可以使用这个修改,它至少收敛,然后一步一步地修改,看看是什么打破了它。

至于分类,标准方法是使用经过训练的编码器从图像中生成特征,然后在此基础上使用普通分类器(SVG或更多)。

代码语言:javascript
复制
batch_size = 16

transform = transforms.Compose([
    transforms.ToTensor(),
])
trainset = MNIST(root='./data/',  train=True, download=True, transform=transform)
dataloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size, shuffle=True, num_workers=8)

class Autoencoder(nn.Module):
    def __init__(self):
        super(Autoencoder,self).__init__()
        self.encoder = nn.Sequential(
            nn.Conv2d(1, 2, kernel_size=5),
            nn.ReLU(),
            nn.Conv2d(2, 4, kernel_size=5),
            )
        self.decoder = nn.Sequential(
            nn.ConvTranspose2d(4, 10, kernel_size=5),
            nn.ReLU(),
            nn.ConvTranspose2d(10, 1, kernel_size=5),
            nn.Sigmoid(),
            )
    def forward(self, x):
        x = self.encoder(x)
        x = self.decoder(x)
        return x

model = Autoencoder().cpu()
distance = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001,weight_decay=1e-5)

num_epochs = 20

outputs = []
for epoch in tqdm(range(num_epochs)):
    for data in dataloader:
        img, _ = data
        img = Variable(img).cpu()
        output = model(img)
        loss = distance(output, img)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        outputs.append(output)
    print('epoch [{}/{}], loss: {:.4f}'.format(epoch+1, num_epochs, loss.item()))



import matplotlib.pyplot as plt
% plotting epoch outputs
for k in range(0, 20):
    plt.figure(figsize=(9, 2))
    imgs = outputs[k].detach().numpy()
    for i, item in enumerate(imgs):
        plt.imshow(item[0])
        plt.title(str(i))
        plt.show()
票数 1
EN

Stack Overflow用户

发布于 2021-03-18 07:42:36

从技术上讲,自动编码器通常不用作分类器。他们学习如何将给定的图像编码为短向量,并从编码后的向量重建相同的图像。它是一种将图像压缩成一个短向量的方法:

由于您希望训练具有分类功能的自动编码器,因此我们需要对模型进行一些更改。首先,会有两种不同的损失:

  1. MSE loss:当前自动编码器重建损失。这将迫使网络通过采用压缩的representation.
  2. Classification损失来输出尽可能接近给定图像的图像:经典交叉熵应该可以做到这一点。这种损失将采用压缩表示(C维)和目标标签来计算负对数似然损失。这种损失将迫使编码器输出压缩表示,以便与目标类很好地对齐。

我对您的代码做了几处更改,以使组合模型正常工作。首先,让我们看一下代码:

代码语言:javascript
复制
 import torch
 import torchvision as tv
 import torchvision.transforms as transforms
 import torch.nn as nn
 import torch.nn.functional as F
 from torch.autograd import Variable
 from torchvision.utils import save_image

 num_epochs = 10
 batch_size = 64
 transform = transforms.Compose([
     transforms.ToTensor(),
     transforms.Normalize((0.1307,), (0.3081,))
 ])     
 
 trainset = tv.datasets.MNIST(root='./data',  train=True, download=True, transform=transform)
 testset  = tv.datasets.MNIST(root='./data',  train=False, download=True, transform=transform)
 dataloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size, shuffle=True, num_workers=4)
 testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size, shuffle=True, num_workers=4)
 device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
 
 class Autoencoderv3(nn.Module):
     def __init__(self):
         super(Autoencoderv3,self).__init__()
         self.encoder = nn.Sequential(
             nn.Conv2d(1, 4, kernel_size=5),
             nn.Dropout2d(p=0.1),
             nn.ReLU(True),
             nn.Conv2d(4, 8, kernel_size=5),
             nn.Dropout2d(p=0.1),
             nn.ReLU(True),
             nn.Flatten(),
             nn.Linear(3200, 10)
             )
         self.softmax = nn.Softmax(dim=1)
         self.decoder = nn.Sequential(
             nn.Linear(10, 400),
             nn.ReLU(True),
             nn.Unflatten(1, (1, 20, 20)),
             nn.Dropout2d(p=0.1),
             nn.ConvTranspose2d(1, 10, kernel_size=5),
             nn.ReLU(True),
             nn.Dropout2d(p=0.1),
             nn.ConvTranspose2d(10, 1, kernel_size=5)
             )
         
     def forward(self, x):
         out_en = self.encoder(x)
         out = self.softmax(out_en)
         out = self.decoder(out)
         return out, out_en
 
 model = Autoencoderv3().to(device)
 distance   = nn.MSELoss()
 class_loss = nn.CrossEntropyLoss()
 
 optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.5)
 
 mse_multp = 0.5
 cls_multp = 0.5
 
 model.train()
 
 for epoch in range(num_epochs):
     total_mseloss = 0.0
     total_clsloss = 0.0
     for ind, data in enumerate(dataloader):
         img, labels = data[0].to(device), data[1].to(device) 
         output, output_en = model(img)
         loss_mse = distance(output, img)
         loss_cls = class_loss(output_en, labels)
         loss = (mse_multp * loss_mse) + (cls_multp * loss_cls)  # Combine two losses together
         optimizer.zero_grad()
         loss.backward()
         optimizer.step()
         # Track this epoch's loss
         total_mseloss += loss_mse.item()
         total_clsloss += loss_cls.item()
 
     # Check accuracy on test set after each epoch:
     model.eval()   # Turn off dropout in evaluation mode
     acc = 0.0
     total_samples = 0
     for data in testloader:
         # We only care about the 10 dimensional encoder output for classification
         img, labels = data[0].to(device), data[1].to(device) 
         _, output_en = model(img)   
         # output_en contains 10 values for each input, apply softmax to calculate class probabilities
         prob = nn.functional.softmax(output_en, dim = 1)
         pred = torch.max(prob, dim=1)[1].detach().cpu().numpy() # Max prob assigned to class 
         acc += (pred == labels.cpu().numpy()).sum()
         total_samples += labels.shape[0]
     model.train()   # Enables dropout back again
     print('epoch [{}/{}], loss_mse: {:.4f}  loss_cls: {:.4f}  Acc on test: {:.4f}'.format(epoch+1, num_epochs, total_mseloss / len(dataloader), total_clsloss / len(dataloader), acc / total_samples))

这段代码现在应该将模型训练为分类器和生成式自动编码器。不过,一般而言,这种类型的方法可能会有一点棘手,无法获得模型训练。在这种情况下,MNIST数据足够简单,可以将这两个互补损失训练在一起。在更复杂的情况下,如生成性对抗网络(GAN),他们应用模型训练切换,冻结一个模型等,以获得整个模型的训练。这个自动编码器模型可以轻松地在MNIST上训练,而不需要执行这些类型的技巧:

代码语言:javascript
复制
 epoch [1/10], loss_mse: 0.8928  loss_cls: 0.4627  Acc on test: 0.9463
 epoch [2/10], loss_mse: 0.8287  loss_cls: 0.2105  Acc on test: 0.9639
 epoch [3/10], loss_mse: 0.7803  loss_cls: 0.1574  Acc on test: 0.9737
 epoch [4/10], loss_mse: 0.7513  loss_cls: 0.1290  Acc on test: 0.9764
 epoch [5/10], loss_mse: 0.7298  loss_cls: 0.1117  Acc on test: 0.9762
 epoch [6/10], loss_mse: 0.7110  loss_cls: 0.1017  Acc on test: 0.9801
 epoch [7/10], loss_mse: 0.6962  loss_cls: 0.0920  Acc on test: 0.9794
 epoch [8/10], loss_mse: 0.6824  loss_cls: 0.0859  Acc on test: 0.9806
 epoch [9/10], loss_mse: 0.6733  loss_cls: 0.0797  Acc on test: 0.9814
 epoch [10/10], loss_mse: 0.6671  loss_cls: 0.0764  Acc on test: 0.9813

正如你所看到的,mse损失和分类损失都在减少,而测试集上的准确率在增加。在代码中,MSE损失和分类损失相加在一起。这意味着根据每个损失计算出的各个梯度相互竞争,迫使网络朝着自己的方向发展。我已经添加了损失乘数来控制每个损失的贡献。如果MSE具有更高的乘数,则网络将从MSE损失中获得更多梯度,这意味着它将更好地学习重构,如果CLS损失具有更高的乘数,则网络将获得更好的分类精度。您可以使用这些乘数来查看最终结果是如何变化的,但MNIST是一个非常简单的数据集,因此可能很难看出差异。目前,它在重构输入方面做得还不错:

代码语言:javascript
复制
 import numpy as np
 import matplotlib.pyplot as plt
 
 model.eval()
 img, labels = list(dataloader)[0]
 img = img.to(device)
 output, output_en = model(img)
 inp = img[0:10, 0, :, :].squeeze().detach().cpu()
 out = output[0:10, 0, :, :].squeeze().detach().cpu()
 
 # Just some trick to concatenate first ten images next to each other
 inp = inp.permute(1,0,2).reshape(28, -1).numpy()
 out = out.permute(1,0,2).reshape(28, -1).numpy()
 combined = np.vstack([inp, out])
 
 plt.imshow(combined)
 plt.show()

我相信通过更多的训练和微调损失乘数,你可以得到更好的结果。

最后,解码器接收编码器输出的softmax。该均值解码器尝试从输入的0-1概率创建输出图像。因此,如果softmax概率向量在输入位置0处为0.98,而在其他位置接近于零,则解码器应该输出一个看起来像0.0的图像。这里我给出网络输入来创建0到9个重构:

代码语言:javascript
复制
 test_arr = np.zeros([10, 10], dtype = np.float32)
 ind = np.arange(0, 10)
 test_arr[ind, ind] = 1.0
 
 model.eval()
 img = torch.from_numpy(test_arr).to(device)
 out = model.decoder(img)
 out = out[0:10, 0, :, :].squeeze().detach().cpu()
 out = out.permute(1,0,2).reshape(28, -1).numpy()
 plt.imshow(out)
 plt.show()

我还在代码中做了一些小的修改,打印时期平均损失等,这并不会真正改变训练逻辑,所以你可以看到代码中的这些变化,如果有任何奇怪的地方,请告诉我。

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/66667949

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档