文章/答案/技术大牛

发布

社区首页 >问答首页 >问: F. Chollet的tf.keras为研究人员提供的“超网络”tape.gradient :速成课程

问问: F. Chollet的tf.keras为研究人员提供的“超网络”tape.gradient :速成课程
EN

Stack Overflow用户

提问于 2019-03-13 03:34:11

回答 1查看 81关注 0票数 0

这是最初的colab笔记本的URL：

https://colab.research.google.com/drive/17u-pRZJnKN0gO5XZmq8n5A2bKGrfKEUg#scrollTo=xEuWqzjlPobA

滚动到“现在快速研究示例:超级网络”上的最后一个单元格：

input_dim = 784
classes = 10

# The model we'll actually use (the hypernetwork).
outer_model = Linear(classes)

# It doesn't need to create its own weights, so let's mark it as already built.
# That way, calling `outer_model` won't create new variables.
outer_model.built = True

# The model that generates the weights of the model above.
inner_model = Linear(input_dim * classes + classes)

# Loss and optimizer.
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
optimizer = tf.keras.optimizers.SGD(learning_rate=1e-3)

# Prepare a dataset.
(x_train, y_train), _ = tf.keras.datasets.mnist.load_data()
dataset = tf.data.Dataset.from_tensor_slices(
    (x_train.reshape(60000, 784).astype('float32') / 255, y_train))

# We'll use a batch size of 1 for this experiment.
dataset = dataset.shuffle(buffer_size=1024).batch(1)

losses = []  # Keep track of the losses over time.
for step, (x, y) in enumerate(dataset):
  with tf.GradientTape() as tape:

    # Predict weights for the outer model.
    weights_pred = inner_model(x)

    # Reshape them to the expected shapes for w and b for the outer model.
    w_pred = tf.reshape(weights_pred[:, :-classes], (input_dim, classes))
    b_pred = tf.reshape(weights_pred[:, -classes:], (classes,))

    # Set the weight predictions as the weight variables on the outer model.
    outer_model.w = w_pred
    outer_model.b = b_pred

    # Inference on the outer model.
    preds = outer_model(x)
    loss = loss_fn(y, preds)

  # Train only inner model.
  grads = tape.gradient(loss, inner_model.trainable_weights)
  optimizer.apply_gradients(zip(grads, inner_model.trainable_weights))

  # Logging.
  losses.append(float(loss))
  if step % 100 == 0:
    print(step, sum(losses) / len(losses))

  # Stop after 1000 steps.
  if step >= 1000:
    break

在训练循环中，请注意：

grads = tape.gradient(loss, inner_model.trainable_weights)

在外面：

with tf.GradientTape() as tape:

我以为这个应该放在里面？如果有人能保证这是正确的，并同时解释梯度磁带发生了什么，那就太好了。

如果你运行这个笔记本，不管代码看起来是如何工作的，因为你可以看到每个时期的损失都在下降。

tensorflow

tf.keras

tensorflow2.0

回答 1

Stack Overflow用户

发布于 2019-03-14 05:28:45

我见过的所有示例都在with语句之外。请注意，tape确实注意到在with语句之外不再存在。仅调用"exit“函数。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/55129419

复制

相似问题

问问: F. Chollet的tf.keras为研究人员提供的“超网络”tape.gradient :速成课程
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问问: F. Chollet的tf.keras为研究人员提供的“超网络”tape.gradient :速成课程EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问问: F. Chollet的tf.keras为研究人员提供的“超网络”tape.gradient :速成课程
EN