文章/答案/技术大牛

发布

社区首页 >问答首页 >为什么在达到完美的训练拟合后训练精度会下降？

问为什么在达到完美的训练拟合后训练精度会下降？
EN

Stack Overflow用户

提问于 2019-06-14 22:08:50

回答 2查看 407关注 0票数 2

我正在用pytorch训练一个关于MNIST数据的神经网络。该模型开始良好，提高了，达到了训练和测试数据的良好精度，稳定了一段时间，然后测试和训练精度都崩溃了，如下面的base results graph所示。

至于MNIST，我使用了60000个训练图像，10000个测试，训练批次大小为100，学习率为0.01。神经网络由两个完全连接的隐藏层组成，每个隐藏层有100个节点，节点具有ReLU激活功能。F.cross_entropy用于损失计算，SGD用于梯度计算。

这不是过度拟合的问题，因为它是训练和测试精度崩溃的问题。我怀疑这与学习率太大有关。在基本情况下，我使用了0.01，但当我将其降低到0.001时，整个图案会在稍后重复，如following graph所示(请注意x轴比例的变化，图案大约会在10次之后发生，这是直观的)。使用更低的学习率也获得了类似的结果。

我已经尝试了单元测试，检查各个部分，并使模型更小。Here是我在训练集中只使用6个数据点时的结果，批量大小为2。在训练数据上达到完美拟合(这里与预期的测试精度明显不同)并不令人惊讶，但它仍然从100%崩溃到1/6，所以不比随机挑选好多少。有谁能告诉我，网络需要做些什么才能从训练集上完美地旋转出来？

以下是网络的结构(之前添加了相关库)，尽管我希望上述症状足以让您认识到没有它的问题是什么：

class Network(nn.Module):
def __init__(self):
    # call to the super class Module from nn
    super(Network, self).__init__()

    # fc strand for 'fully connected'
    self.fc1 = nn.Linear(in_features=28*28, out_features=100)
    self.fc2 = nn.Linear(in_features=100, out_features=100)
    self.out = nn.Linear(in_features=100, out_features=10)

def forward(self, t):

    # (1) input layer (redundant)
    t = t

    # (2) hidden linear layer
    # As my t consists of 28*28 bit pictures, I need to flatten them:
    t = t.reshape(-1, 28*28)
    # Now having this reshaped input, add it to the linear layer
    t = self.fc1(t)
    # Again, apply ReLU as the activation function
    t = F.relu(t)

    # (3) hidden linear layer
    # As above, but reshaping is not needed now
    t = self.fc2(t)
    t = F.relu(t)

    # (4) output layer
    t = self.out(t)
    t = F.softmax(t, dim=1)

    return t

代码的主要执行：

for b in range(epochs):
print('***** EPOCH NO. ', b+1)
# getting a batch iterator
batch_iterator = iter(batch_train_loader)
# For loop for a single epoch, based on the length of the training set and the batch size
for a in range(round(train_size/b_size)):
    print(a+1)
    # get one batch for the iteration
    batch = next(batch_iterator)
    # decomposing a batch
    images, labels = batch[0].to(device), batch[1].to(device)
    # to get a prediction, as with individual layers, we need to equate it to the network with the samples as input:
    preds = network(images)
    # with the predictions, we will use F to get the loss as cross_entropy
    loss = F.cross_entropy(preds, labels)
    # function for counting the number of correct predictions
    get_num_correct(preds, labels))
    # calculate the gradients needed for update of weights
    loss.backward()
    # with the known gradients, we will update the weights according to stochastic gradient descent
    optimizer = optim.SGD(network.parameters(), lr=learning_rate)
    # with the known weights, step in the direction of correct estimation
    optimizer.step()
    # check if the whole data check should be performed (for taking full training/test data checks only in evenly spaced intervals on the log scale, pre-calculated later)
    if counter in X_log:
        # get the result on the whole train data and record them
        full_train_preds = network(full_train_images)
        full_train_loss = F.cross_entropy(full_train_preds, full_train_labels)
        # Record train loss
        a_train_loss.append(full_train_loss.item())
        # Get a proportion of correct estimates, to make them comparable between train and test data
        full_train_num_correct = get_num_correct(full_train_preds, full_train_labels)/train_size
        # Record train accuracy
        a_train_num_correct.append(full_train_num_correct)
        print('Correct predictions of the dataset:', full_train_num_correct)
        # Repeat for test predictions
        # get the results for the whole test data
        full_test_preds = network(full_test_images)
        full_test_loss = F.cross_entropy(full_test_preds, full_test_labels)
        a_test_loss.append(full_test_loss.item())
        full_test_num_correct = get_num_correct(full_test_preds, full_test_labels)/test_size
        a_test_num_correct.append(full_test_num_correct)
    # update counter
    counter = counter + 1

我已经在谷歌上搜索并检查了这个问题的答案，但人们要么问过拟合问题，要么他们的神经网络根本不能提高训练集的准确性(即它们根本不起作用)，而不是找到一个好的训练匹配，然后完全失去它，也是在训练集上。我希望我没有发表什么明显的东西，我对NN比较陌生，但在这里发表之前我已经尽了最大的努力研究这个话题，谢谢你的帮助和理解！

pytorch

neural-network

回答 2

Stack Overflow用户

回答已采纳

发布于 2019-06-15 22:28:37

原因是代码中的错误。我们需要在训练循环的开头添加optimizator.zero_grad()，并在外部训练循环之前创建优化器，即

optimizator = optim.SGD(...)
for b in range(epochs):

Why do we need to call zero_grad() in PyTorch?解释了其中的原因。

票数 1

Stack Overflow用户

发布于 2019-06-14 23:22:22

因此，我对此的看法是，您使用了太多的时期，并且过度训练了模型(而不是过度拟合)。在不断刷新偏差/权重的某个时间点之后，它们不再能够将值与噪声区分开来。

我建议你查看一下https://machinelearningmastery.com/early-stopping-to-avoid-overtraining-neural-network-models/，看看它是否与你所看到的一致，因为这是我想到的第一件事。

也许你也可以看看这篇文章。https://stats.stackexchange.com/questions/198629/difference-between-overtraining-and-overfitting (不是说这是副本)

这本出版物:反向传播神经网络中的过度训练:阴极射线管颜色校准示例https://onlinelibrary.wiley.com/doi/pdf/10.1002/col.10027

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/56599871

复制

相似问题

问为什么在达到完美的训练拟合后训练精度会下降？
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问为什么在达到完美的训练拟合后训练精度会下降？EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问为什么在达到完美的训练拟合后训练精度会下降？
EN