文章/答案/技术大牛

发布

问关于梯度积累的澄清
EN

Stack Overflow用户

提问于 2021-12-23 10:54:46

回答 1查看 862关注 0票数 3

我试图更好地理解梯度积累是如何工作的，以及为什么它是有用的。为此，我想问一问，这两种可能的具有梯度积累的自定义训练循环的PyTorch类实现之间有什么区别(如果有的话)：

gradient_accumulation_steps = 5
for batch_idx, batch in enumerate(dataset):
  x_batch, y_true_batch = batch
  y_pred_batch = model(x_batch)

  loss = loss_fn(y_true_batch, y_pred_batch)
  loss.backward()

  if (batch_idx + 1) % gradient_accumulation_steps == 0: # (assumption: the number of batches is a multiple of gradient_accumulation_steps)
    optimizer.step()
    optimizer.zero_grad()

y_true_batches, y_pred_batches = [], []
gradient_accumulation_steps = 5
for batch_idx, batch in enumerate(dataset):
  x_batch, y_true_batch = batch
  y_pred_batch = model(x_batch)

  y_true_batches.append(y_true_batch)
  y_pred_batches.append(y_pred_batch)

  if (batch_idx + 1) % gradient_accumulation_steps == 0: # (assumption: the number of batches is a multiple of gradient_accumulation_steps)
    y_true = stack_vertically(y_true_batches)
    y_pred = stack_vertically(y_pred_batches)

    loss = loss_fn(y_true, y_pred)
    loss.backward()
    optimizer.step()
    optimizer.zero_grad()

    y_true_batches.clear()
    y_pred_batches.clear()

另外，作为一个不相关的问题:既然梯度积累的目的是在内存受限的情况下模仿更大的批次大小，这是否意味着我也应该按比例提高学习速度？

python

pytorch

gradient-descent

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-01-03 08:13:42

1.两个程序之间的区别：

在概念上，您的两个实现是相同的:您为每个权重更新转发gradient_accumulation_steps批。

正如您已经看到的，第二种方法比第一种方法需要更多的内存资源。

但是，有一个细微的差别:通常情况下，丢失函数的实现使用mean来减少批处理的损失。当您使用梯度积累(第一次实现)时，可以减少对每个小批处理使用mean，但在累积的gradient_accumulation_steps小批上使用sum。为了确保累积梯度实现与大批实现相同，您需要非常小心地减少损失函数。在许多情况下，您需要将累积损失除以gradient_accumulation_steps。有关详细的改进，请参见this answer。

2.批次大小与学习率：学习率与批次大小确实相关。当增加批处理大小时，通常会降低学习速度。

见，例如：

塞缪尔·L·史密斯，彼得-简·金德曼斯，克里斯·英，Quoc V. Le，。

票数 2

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/70461130

复制

相似问题

问关于梯度积累的澄清
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问关于梯度积累的澄清EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问关于梯度积累的澄清
EN