晚上好,
我想用tf2和梯度磁带函数实现一个简单回归问题的玩具示例。使用Model.fit,它可以正确地学习,但与GradientTape一样,它也会做一些事情,但与model.fit()相比,损失不会移动。这里我的例子代码和结果。我找不到问题了。
model_opt = tf.keras.optimizers.Adam()
loss_fn = tf.keras.losses.MeanSquaredError()
with tf.GradientTape() as tape:
y = model(X, training=True)
loss_value = loss_fn(y_true, y)
grads = tape.gradient(loss_value, model.trainable_variables)
model_opt.apply_gradients(zip(grads, model.trainable_variables))
#Results:
42.47433806265809
42.63973672226078
36.687397360178586
38.744844324717526
36.59080452300609
...这里是model.fit()的常规情况。
model.compile(optimizer=tf.keras.optimizers.Adam(),loss=tf.keras.losses.MSE,metrics="mse")
...
model.fit(X,y_true,verbose=0)
#Results
[40.97759069299212]
[28.04145720307729]
[17.643483147375473]
[7.575242056454791]
[5.83682193867299]准确度应该大致相同,但看起来一点也学不到。输入X是张量,y_true也是。
用于测试的编辑
import pathlib
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
dataset_path = keras.utils.get_file("auto-mpg.data", "http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data")
column_names = ['MPG','Cylinders','Displacement','Horsepower','Weight',
'Acceleration', 'Model Year', 'Origin']
dataset = pd.read_csv(dataset_path, names=column_names,
na_values = "?", comment='\t',
sep=" ", skipinitialspace=True)
dataset = dataset.dropna()
dataset['Origin'] = dataset['Origin'].map({1: 'USA', 2: 'Europe', 3: 'Japan'})
dataset = pd.get_dummies(dataset, prefix='', prefix_sep='')
train_dataset = dataset.sample(frac=0.8,random_state=0)
test_dataset = dataset.drop(train_dataset.index)
train_stats = train_dataset.describe()
train_stats.pop("MPG")
train_stats = train_stats.transpose()
train_labels = train_dataset.pop('MPG')
test_labels = test_dataset.pop('MPG')
def norm(x):
return (x - train_stats['mean']) / train_stats['std']
normed_train_data = norm(train_dataset)
normed_test_data = norm(test_dataset)
def build_model_fit():
model = keras.Sequential([
layers.Dense(64, activation='relu', input_shape=[len(train_dataset.keys())]),
layers.Dense(64, activation='relu'),
layers.Dense(1)])
optimizer = tf.keras.optimizers.RMSprop(0.001)
model.compile(loss='mse',optimizer=optimizer)
return model
def build_model_tape():
model = keras.Sequential([
layers.Dense(64, activation='relu', input_shape=[len(train_dataset.keys())]),
layers.Dense(64, activation='relu'),
layers.Dense(1)])
opt = tf.keras.optimizers.RMSprop(0.001)
return model, opt
model_f = build_model_fit()
model_g, opt_g = build_model_tape()
EPOCHS = 20
#Model.fit() - Test
history = model_f.fit(normed_train_data, train_labels, epochs=EPOCHS, verbose=2)
X = tf.convert_to_tensor(normed_train_data.to_numpy())
y_true = tf.convert_to_tensor(train_labels.to_numpy())
#GradientTape - Test
loss_fn = tf.keras.losses.MeanSquaredError()
for i in range(0,EPOCHS):
with tf.GradientTape() as tape:
y = model_g(X, training=True)
loss_value = loss_fn(y_true, y)
grads = tape.gradient(loss_value, model_g.trainable_variables)
opt_g.apply_gradients(zip(grads, model_g.trainable_variables))
print(loss_value)发布于 2020-08-23 21:22:19
由于在model.fit和tf.GradientTape培训循环中使用了不同的批处理大小,导致损失值的差异。如果未指定batch_size关键字参数给model.fit,则将使用32批大小。在tf.GradientTape训练循环中,批处理大小等于训练集中的样本数(即314)。
要解决这个问题,请在训练循环中实现批处理。一种方法是使用tf.data API,如下所示。
loss_fn = tf.keras.losses.MeanSquaredError()
for i in range(0,EPOCHS):
epoch_losses = []
for x_batch, y_batch in tf.data.Dataset.from_tensor_slices((X, y_true)).batch(32):
with tf.GradientTape() as tape:
y = model_g(x_batch, training=True)
loss_value = loss_fn(y_batch, y)
epoch_losses.append(loss_value.numpy())
grads = tape.gradient(loss_value, model_g.trainable_variables)
opt_g.apply_gradients(zip(grads, model_g.trainable_variables))
print(np.mean(loss_value))还要注意的是,model.fit每次迭代都会对数据进行洗牌,而定制的训练循环则不会(这需要由开发人员实现)。
https://stackoverflow.com/questions/63550752
复制相似问题