首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >可以用step函数作为损失函数训练神经网络吗?

可以用step函数作为损失函数训练神经网络吗?
EN

Stack Overflow用户
提问于 2020-01-31 07:53:10
回答 1查看 478关注 0票数 1

作为头衔,

我试图建立预测PM2.5的模型,

这是可能的使用损失函数的梯度体面,如mse,rmse,mae...etc。

但是当我使用自定义丢失函数和step函数时,我似乎没有更新权重。

在我的模型的最后一层,是输出pm2.5预测,

我尝试使用step函数来计算损失。

代码语言:javascript
复制
def custom_loss(y_true,y_pred):
  z_true = step_function(y_true)
  z_pred = step_function(y_pred)
  return K.abs(z_true -z_pred)

我的step功能是尝试将PM2.5转换为AQI级别。

代码语言:javascript
复制
def step_function(x):
  step1 = ((K.tanh(x-15.45))+1)/2  # is means PM2.5 <15.45 return 0 >15.45 return 1 
  step2 = ((K.tanh(x-35.45))+1)/2  # is means PM2.5 <35.45 return 0 >35.45 return 1 
  return (step1+step2)  # if x(PM2.5) = 50 , will return 2

当y_true和y_pred等于0,而step函数返回0时,不能微分,所以发生权值没有更新吗?

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-06-04 14:51:28

正如您正确地提到的,当优化器没有任何东西可以最小化时,您必须处理损失。因此,模型的权重也不会更新。因此,在这种情况下,理想的方法是使用自定义培训在step级别跟踪step

您将有更多的控制与自定义培训。如果您想要比fit()evaluate()提供的更低级别的培训和评估循环,那么您应该编写自己的培训循环。其实很简单。但是,您应该准备好自己做更多的调试工作。

GradientTape作用域中调用模型可以使您检索该层相对于损失值的可训练权重的梯度。使用优化器实例,可以使用这些梯度更新这些变量(可以使用model.trainable_weights检索这些变量)。

TensorFlow提供了用于自动微分的tf.GradientTape API --计算相对于其输入变量的计算梯度。Tensorflow将在tf.GradientTape上下文中执行的所有操作“记录”到“磁带”上。然后,Tensorflow使用该磁带和与每个记录操作相关联的梯度来使用反向模式微分计算“记录”计算的梯度。

如果要在应用渐变之前处理渐变,可以按以下三个步骤使用优化器:

  1. tf.GradientTape计算梯度。
  2. 按您的意愿处理渐变。
  3. 使用apply_gradients()应用已处理的梯度。

下面是mnist数据的一个简单示例。注释出现在代码中,以便更好地解释。

码-

代码语言:javascript
复制
import tensorflow as tf
print(tf.__version__)
from tensorflow import keras
from tensorflow.keras import layers

(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Preprocess the data (these are Numpy arrays)
x_train = x_train.reshape(60000, 784).astype('float32') / 255
x_test = x_test.reshape(10000, 784).astype('float32') / 255

y_train = y_train.astype('float32')
y_test = y_test.astype('float32')

# Reserve 10,000 samples for validation
x_val = x_train[-10000:]
y_val = y_train[-10000:]
x_train = x_train[:-10000]
y_train = y_train[:-10000]

# Get the model.
inputs = keras.Input(shape=(784,), name='digits')
x = layers.Dense(64, activation='relu', name='dense_1')(inputs)
x = layers.Dense(64, activation='relu', name='dense_2')(x)
outputs = layers.Dense(10, name='predictions')(x)
model = keras.Model(inputs=inputs, outputs=outputs)

# Instantiate an optimizer.
optimizer = keras.optimizers.SGD(learning_rate=1e-3)
# Instantiate a loss function.
loss_fn = keras.losses.SparseCategoricalCrossentropy(from_logits=True)

# Prepare the training dataset.
batch_size = 64
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(batch_size)

epochs = 3
for epoch in range(epochs):
  print('Start of epoch %d' % (epoch,))

  # Iterate over the batches of the dataset.
  for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):

    # Open a GradientTape to record the operations run
    # during the forward pass, which enables autodifferentiation.
    with tf.GradientTape() as tape:

      # Run the forward pass of the layer.
      # The operations that the layer applies
      # to its inputs are going to be recorded
      # on the GradientTape.
      logits = model(x_batch_train, training=True)  # Logits for this minibatch

      # Compute the loss value for this minibatch.
      loss_value = loss_fn(y_batch_train, logits)

    # Use the gradient tape to automatically retrieve
    # the gradients of the trainable variables with respect to the loss.
    grads = tape.gradient(loss_value, model.trainable_weights)

    # Run one step of gradient descent by updating
    # the value of the variables to minimize the loss.
    optimizer.apply_gradients(zip(grads, model.trainable_weights))

    # Log every 200 batches.
    if step % 200 == 0:
        print('Training loss (for one batch) at step %s: %s' % (step, float(loss_value)))
        print('Seen so far: %s samples' % ((step + 1) * 64))

输出-

代码语言:javascript
复制
2.2.0
Start of epoch 0
Training loss (for one batch) at step 0: 2.323657512664795
Seen so far: 64 samples
Training loss (for one batch) at step 200: 2.3156163692474365
Seen so far: 12864 samples
Training loss (for one batch) at step 400: 2.2302279472351074
Seen so far: 25664 samples
Training loss (for one batch) at step 600: 2.131979465484619
Seen so far: 38464 samples
Start of epoch 1
Training loss (for one batch) at step 0: 2.00234317779541
Seen so far: 64 samples
Training loss (for one batch) at step 200: 1.7992427349090576
Seen so far: 12864 samples
Training loss (for one batch) at step 400: 1.8583933115005493
Seen so far: 25664 samples
Training loss (for one batch) at step 600: 1.6005337238311768
Seen so far: 38464 samples
Start of epoch 2
Training loss (for one batch) at step 0: 1.6701987981796265
Seen so far: 64 samples
Training loss (for one batch) at step 200: 1.6237502098083496
Seen so far: 12864 samples
Training loss (for one batch) at step 400: 1.3603084087371826
Seen so far: 25664 samples
Training loss (for one batch) at step 600: 1.246948480606079
Seen so far: 38464 samples

您可以找到更多关于tf.GradientTape 这里的信息。这里使用的示例取自这里

希望这能回答你的问题。学习愉快。

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/59999902

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档