在论文Attention is all you need的第5.3节中,作者建议线性增加学习率,然后与步长的平方根倒数成比例降低。

我们如何使用Adam优化器在PyTorch中实现这一点?优选地,没有附加的包。
发布于 2020-12-18 00:07:59
PyTorch提供了学习率调度器,用于在训练过程中实现调整学习率的各种方法。一些简单的LR调度器已经实现,可以在这里找到:https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
在您的特殊情况下,您可以-就像其他LR调度器一样-子类化_LRScheduler,以实现基于纪元数的可变调度。对于一个基本的方法,您只需要实现__init__()和get_lr()方法。
只需注意,这些调度器中的许多都希望您在每个时期调用一次.step()。但是您也可以更频繁地更新它,甚至传递一个自定义参数,就像余弦退火LR调度器中的那样:https://pytorch.org/docs/stable/_modules/torch/optim/lr_scheduler.html#CosineAnnealingLR
发布于 2021-02-16 01:31:57
class NoamOpt:
"Optim wrapper that implements rate."
def __init__(self, model_size, factor, warmup, optimizer):
self.optimizer = optimizer
self._step = 0
self.warmup = warmup
self.factor = factor
self.model_size = model_size
self._rate = 0
def step(self):
"Update parameters and rate"
self._step += 1
rate = self.rate()
for p in self.optimizer.param_groups:
p['lr'] = rate
self._rate = rate
self.optimizer.step()
def rate(self, step = None):
"Implement `lrate` above"
if step is None:
step = self._step
return self.factor * \
(self.model_size ** (-0.5) *
min(step ** (-0.5), step * self.warmup ** (-1.5)))
def get_std_opt(model):
return NoamOpt(model.src_embed[0].d_model, 2, 4000,torch.optim.Adam(model.parameters(), lr=0, betas=(0.9, 0.98), eps=1e-9))如:https://nlp.seas.harvard.edu/2018/04/03/attention.html#optimizer
发布于 2021-03-19 01:41:03
正如上一条评论中所建议的,我们可以使用https://nlp.seas.harvard.edu/2018/04/03/attention.html#optimizer引入的类。但是,除非我们定义一个函数来更新state_dict,否则这个答案将会给出一个错误。
下面是完整的调度器:
class NoamOpt:
"Optim wrapper that implements rate."
def __init__(self, model_size, warmup, optimizer):
self.optimizer = optimizer
self._step = 0
self.warmup = warmup
self.model_size = model_size
self._rate = 0
def state_dict(self):
"""Returns the state of the warmup scheduler as a :class:`dict`.
It contains an entry for every variable in self.__dict__ which
is not the optimizer.
"""
return {key: value for key, value in self.__dict__.items() if key != 'optimizer'}
def load_state_dict(self, state_dict):
"""Loads the warmup scheduler's state.
Arguments:
state_dict (dict): warmup scheduler state. Should be an object returned
from a call to :meth:`state_dict`.
"""
self.__dict__.update(state_dict)
def step(self):
"Update parameters and rate"
self._step += 1
rate = self.rate()
for p in self.optimizer.param_groups:
p['lr'] = rate
self._rate = rate
self.optimizer.step()
def rate(self, step = None):
"Implement `lrate` above"
if step is None:
step = self._step
return (self.model_size ** (-0.5) *
min(step ** (-0.5), step * self.warmup ** (-1.5))) 稍后,为了在训练循环中使用它:
optimizer = NoamOpt(input_opts['d_model'], 500,
torch.optim.Adam(model.parameters(), lr=0, betas=(0.9, 0.98), eps=1e-9))。。。
optimizer.step()https://stackoverflow.com/questions/65343377
复制相似问题