文章/答案/技术大牛

发布

社区首页 >问答首页 >在Tensorflow 2中训练时改变BatchNormalization动量

问在Tensorflow 2中训练时改变BatchNormalization动量
EN

Stack Overflow用户

提问于 2021-11-14 15:37:45

回答 1查看 178关注 0票数 0

我希望批量归一化运行统计数据(均值和方差)在训练结束时收敛，这需要将批量范数动量从一些初始值增加到1.0。我设法使用自定义的Callback改变了动量，但它只有在我的模型在eager模式下编译时才有效。玩具示例(它在epoch 0之后设置momentum=1.0，因为moving_mean应该停止更新)：

import tensorflow as tf  # version 2.3.1
import tensorflow_datasets as tfds

ds_train, ds_test = tfds.load("mnist", split=["train", "test"], shuffle_files=True, as_supervised=True)
ds_train = ds_train.batch(128)
ds_test = ds_test.batch(128)

model = tf.keras.models.Sequential(
    [
        tf.keras.layers.Flatten(input_shape=(28, 28)),
        tf.keras.layers.Dense(128),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.ReLU(),
        tf.keras.layers.Dense(10),
    ]
)


model.compile(
    optimizer=tf.keras.optimizers.Adam(0.001),
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=[tf.keras.metrics.SparseCategoricalAccuracy()],
    # run_eagerly=True,
)


class BatchNormMomentumCallback(tf.keras.callbacks.Callback):
    def on_epoch_begin(self, epoch, logs=None):
        last_bn_layer = None
        for layer in self.model.layers:
            if isinstance(layer, tf.keras.layers.BatchNormalization):
                if epoch == 0:
                    layer.momentum = 0.99
                else:
                    layer.momentum = 1.0
                last_bn_layer = layer
        if last_bn_layer:
            tf.print("Momentum=" + str(last_bn_layer.moving_mean[-1].numpy()))  # Should not change after epoch 1


batchnorm_decay = BatchNormMomentumCallback()
model.fit(ds_train, epochs=6, validation_data=ds_test, callbacks=[batchnorm_decay], verbose=0)

输出(在run_eagerly=False时获取此信息)

Momentum=0.0
Momentum=-102.20184
Momentum=-106.04614
Momentum=-116.36204
Momentum=-129.995
Momentum=-123.70443

预期输出(当为run_eagerly=True时获取)

Momentum=0.0
Momentum=-5.9038606
Momentum=-5.9038606
Momentum=-5.9038606
Momentum=-5.9038606
Momentum=-5.9038606

我猜这是因为在图形模式下，TF将模型编译为具有定义为0.99的动量的图形，并在图形中使用此值(因此BatchNormMomentumCallback不会更新momentum )。

问:有没有一种方法可以在训练时更新图形中已编译的momentum变量？我想更新momentum而不是在急切模式下(即使用run_eagerly=False)，因为训练效率很重要。

python

tensorflow

keras

deep-learning

batch-normalization

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-11-15 07:19:28

对于您的用例，我建议您简单地使用自定义训练循环。您将拥有所需的所有灵活性：

import tensorflow as tf  # version 2.3.1
import tensorflow_datasets as tfds

ds_train, ds_test = tfds.load("mnist", split=["train", "test"], shuffle_files=True, as_supervised=True)
ds_train = ds_train.batch(128)
ds_test = ds_test.batch(128)

model = tf.keras.models.Sequential(
    [
        tf.keras.layers.Flatten(input_shape=(28, 28)),
        tf.keras.layers.Dense(128),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.ReLU(),
        tf.keras.layers.Dense(10),
    ]
)

optimizer = tf.keras.optimizers.Adam(0.001)
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
train_acc_metric = tf.keras.metrics.SparseCategoricalAccuracy()
batch_norm_layer = model.layers[2]

@tf.function
def train_step(epoch, model, batch):
    if epoch == 0:
        batch_norm_layer.momentum = 0.99
    else:
        batch_norm_layer.momentum = 1.0

    with tf.GradientTape() as tape:
        x_batch_train, y_batch_train = batch

        logits = model(x_batch_train, training=True)
        loss_value = loss_fn(y_batch_train, logits)

    train_acc_metric.update_state(y_batch_train, logits)
    grads = tape.gradient(loss_value, model.trainable_weights)
    optimizer.apply_gradients(zip(grads, model.trainable_weights))

epochs = 6
for epoch in range(epochs):
    tf.print("\nStart of epoch %d" % (epoch,))
    tf.print("Momentum = ", batch_norm_layer.moving_mean[-1], summarize=-1)
    for batch in ds_train:
        train_step(epoch, model, batch)
        
    train_acc = train_acc_metric.result()
    tf.print("Training acc over epoch: %.4f" % (float(train_acc),))
    train_acc_metric.reset_states()

Start of epoch 0
Momentum =  0
Training acc over epoch: 0.9158

Start of epoch 1
Momentum =  -20.2749767
Training acc over epoch: 0.9634

Start of epoch 2
Momentum =  -20.2749767
Training acc over epoch: 0.9755

Start of epoch 3
Momentum =  -20.2749767
Training acc over epoch: 0.9826

Start of epoch 4
Momentum =  -20.2749767
Training acc over epoch: 0.9876

Start of epoch 5
Momentum =  -20.2749767
Training acc over epoch: 0.9915

一个简单的测试表明，使用tf.function装饰器的函数执行得更好：

import tensorflow as tf  # version 2.3.1
import tensorflow_datasets as tfds
import timeit

ds_train, ds_test = tfds.load("mnist", split=["train", "test"], shuffle_files=True, as_supervised=True)
ds_train = ds_train.batch(128)
ds_test = ds_test.batch(128)

model = tf.keras.models.Sequential(
    [
        tf.keras.layers.Flatten(input_shape=(28, 28)),
        tf.keras.layers.Dense(128),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.ReLU(),
        tf.keras.layers.Dense(10),
    ]
)

optimizer = tf.keras.optimizers.Adam(0.001)
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
train_acc_metric = tf.keras.metrics.SparseCategoricalAccuracy()
batch_norm_layer = model.layers[2]

@tf.function
def train_step(epoch, model, batch):
    if epoch == 0:
        batch_norm_layer.momentum = 0.99
    else:
        batch_norm_layer.momentum = 1.0

    with tf.GradientTape() as tape:
        x_batch_train, y_batch_train = batch

        logits = model(x_batch_train, training=True)
        loss_value = loss_fn(y_batch_train, logits)

    train_acc_metric.update_state(y_batch_train, logits)
    grads = tape.gradient(loss_value, model.trainable_weights)
    optimizer.apply_gradients(zip(grads, model.trainable_weights))

def train_step_without_tffunction(epoch, model, batch):
    if epoch == 0:
        batch_norm_layer.momentum = 0.99
    else:
        batch_norm_layer.momentum = 1.0

    with tf.GradientTape() as tape:
        x_batch_train, y_batch_train = batch

        logits = model(x_batch_train, training=True)
        loss_value = loss_fn(y_batch_train, logits)

    train_acc_metric.update_state(y_batch_train, logits)
    grads = tape.gradient(loss_value, model.trainable_weights)
    optimizer.apply_gradients(zip(grads, model.trainable_weights))

epochs = 6
for epoch in range(epochs):
    tf.print("\nStart of epoch %d" % (epoch,))
    tf.print("Momentum = ", batch_norm_layer.moving_mean[-1], summarize=-1)
    test = True
    for batch in ds_train:
        train_step(epoch, model, batch)
        if test:
          tf.print("TF function:", timeit.timeit(lambda: train_step(epoch, model, batch), number=10))
          tf.print("Eager function:", timeit.timeit(lambda: train_step_without_tffunction(epoch, model, batch), number=10))
          test = False 
    train_acc = train_acc_metric.result()
    tf.print("Training acc over epoch: %.4f" % (float(train_acc),))
    train_acc_metric.reset_states()

Start of epoch 0
Momentum =  0
TF function: 0.02285163299893611
Eager function: 0.11109527599910507
Training acc over epoch: 0.9229

Start of epoch 1
Momentum =  -88.1852188
TF function: 0.024091466999379918
Eager function: 0.1109461480009486
Training acc over epoch: 0.9639

Start of epoch 2
Momentum =  -88.1852188
TF function: 0.02331122400210006
Eager function: 0.11751473100230214
Training acc over epoch: 0.9756

Start of epoch 3
Momentum =  -88.1852188
TF function: 0.02656845700039412
Eager function: 0.1121610670015798
Training acc over epoch: 0.9830

Start of epoch 4
Momentum =  -88.1852188
TF function: 0.02821972700257902
Eager function: 0.15709391699783737
Training acc over epoch: 0.9877

Start of epoch 5
Momentum =  -88.1852188
TF function: 0.02441513300072984
Eager function: 0.10921925399816246
Training acc over epoch: 0.9917

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/69964540

复制

相似问题

问在Tensorflow 2中训练时改变BatchNormalization动量
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在Tensorflow 2中训练时改变BatchNormalization动量EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在Tensorflow 2中训练时改变BatchNormalization动量
EN