DAIN paper描述了网络如何自己学习标准化时间序列数据,here是作者如何实现它的。这段代码让我认为规范化是跨行进行的,而不是跨列进行的。有人能解释一下为什么要这样实现吗?因为我一直认为只有跨列归一化时间序列才能保留每个特征的真实信息。
下面是进行规范化的部分:
```pythonDAIN_Layer类(nn.Module):
def __init__(self, mode='adaptive_avg', mean_lr=0.00001, gate_lr=0.001, scale_lr=0.00001, input_dim=144): super(DAIN_Layer, self).__init__() print("Mode = ", mode) self.mode = mode self.mean_lr = mean_lr self.gate_lr = gate_lr self.scale_lr = scale_lr # Parameters for adaptive average self.mean_layer = nn.Linear(input_dim, input_dim, bias=False) self.mean_layer.weight.data = torch.FloatTensor(data=np.eye(input_dim, input_dim)) # Parameters for adaptive std self.scaling_layer = nn.Linear(input_dim, input_dim, bias=False) self.scaling_layer.weight.data = torch.FloatTensor(data=np.eye(input_dim, input_dim)) # Parameters for adaptive scaling self.gating_layer = nn.Linear(input_dim, input_dim) self.eps = 1e-8def forward(self, x): # Expecting (n_samples, dim, n_feature_vectors) # Nothing to normalize if self.mode == None: pass # Do simple average normalization elif self.mode == 'avg': avg = torch.mean(x, 2) avg = avg.resize(avg.size(0), avg.size(1), 1) x = x - avg # Perform only the first step (adaptive averaging) elif self.mode == 'adaptive_avg': avg = torch.mean(x, 2) adaptive_avg = self.mean_layer(avg) adaptive_avg = adaptive_avg.resize(adaptive_avg.size(0), adaptive_avg.size(1), 1) x = x - adaptive_avg # Perform the first + second step (adaptive averaging + adaptive scaling ) elif self.mode == 'adaptive_scale': # Step 1: avg = torch.mean(x, 2) adaptive_avg = self.mean_layer(avg) adaptive_avg = adaptive_avg.resize(adaptive_avg.size(0), adaptive_avg.size(1), 1) x = x - adaptive_avg # Step 2: std = torch.mean(x ** 2, 2) std = torch.sqrt(std + self.eps) adaptive_std = self.scaling_layer(std) adaptive_std[adaptive_std <= self.eps] = 1 adaptive_std = adaptive_std.resize(adaptive_std.size(0), adaptive_std.size(1), 1) x = x / (adaptive_std) elif self.mode == 'full': # Step 1: avg = torch.mean(x, 2) adaptive_avg = self.mean_layer(avg) adaptive_avg = adaptive_avg.resize(adaptive_avg.size(0), adaptive_avg.size(1), 1) x = x - adaptive_avg # # Step 2: std = torch.mean(x ** 2, 2) std = torch.sqrt(std + self.eps) adaptive_std = self.scaling_layer(std) adaptive_std[adaptive_std <= self.eps] = 1 adaptive_std = adaptive_std.resize(adaptive_std.size(0), adaptive_std.size(1), 1) x = x / adaptive_std # Step 3: avg = torch.mean(x, 2) gate = F.sigmoid(self.gating_layer(avg)) gate = gate.resize(gate.size(0), gate.size(1), 1) x = x * gate else: assert False return x发布于 2020-12-04 20:27:06
我也不确定,但它们确实在前向函数中转置:x= MLP类的x.transpose(1,2)。因此,在我看来,随着时间的推移,它们对每个功能都是正常的。
https://stackoverflow.com/questions/64936553
复制相似问题