首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >GradientTape.gradient的概念理解

GradientTape.gradient的概念理解
EN

Stack Overflow用户
提问于 2020-03-13 04:50:01
回答 1查看 2.3K关注 0票数 4

背景

在Tensorflow 2中,存在一个名为GradientTape的类,它用于记录张量上的运算,其结果可以被区分并输入到某种极小化算法中。例如,从文件中我们有一个例子:

代码语言:javascript
复制
x = tf.constant(3.0)
with tf.GradientTape() as g:
  g.watch(x)
  y = x * x
dy_dx = g.gradient(y, x) # Will compute to 6.0

文档串用于gradient方法意味着第一个参数不仅可以是张量,还可以是张量列表:

代码语言:javascript
复制
 def gradient(self,
               target,
               sources,
               output_gradients=None,
               unconnected_gradients=UnconnectedGradients.NONE):
    """Computes the gradient using operations recorded in context of this tape.

    Args:
      target: a list or nested structure of Tensors or Variables to be
        differentiated.
      sources: a list or nested structure of Tensors or Variables. `target`
        will be differentiated against elements in `sources`.
      output_gradients: a list of gradients, one for each element of
        target. Defaults to None.
      unconnected_gradients: a value which can either hold 'none' or 'zero' and
        alters the value which will be returned if the target and sources are
        unconnected. The possible values and effects are detailed in
        'UnconnectedGradients' and it defaults to 'none'.

    Returns:
      a list or nested structure of Tensors (or IndexedSlices, or None),
      one for each element in `sources`. Returned structure is the same as
      the structure of `sources`.

    Raises:
      RuntimeError: if called inside the context of the tape, or if called more
       than once on a non-persistent tape.
      ValueError: if the target is a variable or if unconnected gradients is
       called with an unknown value.
    """

在上面的例子中,很容易看到ytarget,是要区分的函数,x是因变量--相对于“梯度”。

根据我有限的经验,gradient方法似乎返回一个张量列表,每个sources元素都返回一个张量,而每个梯度都是与sources的对应成员形状相同的张量。

问题

如果gradients包含要区分的单个1x1“张量”,则上面对target行为的描述是有意义的,因为从数学上讲,梯度向量应该与函数的域相同。

但是,如果target是张量列表,则gradients的输出仍然是相同的形状。为什么是这种情况?如果target被认为是一个函数列表,那么输出不应该类似于Jacobian吗?我该如何从概念上解释这种行为呢?

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-03-18 17:14:37

tf.GradientTape().gradient()就是这样定义的。它具有与tf.gradients()相同的功能,但后者不能在急切模式下使用。来自文档 of tf.gradients()

它返回长度为len(xs)的张量列表,其中每个张量都是sum(dy/dx) for y in ys

其中xssourcesystarget

示例1

让我们假设target = [y1, y2]sources = [x1, x2]。其结果将是:

代码语言:javascript
复制
[dy1/dx1 + dy2/dx1, dy1/dx2 + dy2/dx2]

示例2

计算每个样本损耗(张量)与减少损失(标量)的梯度。

代码语言:javascript
复制
Let w, b be two variables. 
xentropy = [y1, y2] # tensor
reduced_xentropy = 0.5 * (y1 + y2) # scalar
grads = [dy1/dw + dy2/dw, dy1/db + dy2/db]
reduced_grads = [d(reduced_xentropy)/dw, d(reduced_xentropy)/db]
              = [d(0.5 * (y1 + y2))/dw, d(0.5 * (y1 + y2))/db] 
              == 0.5 * grads

以上片段的Tensorflow示例:

代码语言:javascript
复制
import tensorflow as tf

print(tf.__version__) # 2.1.0

inputs = tf.convert_to_tensor([[0.1, 0], [0.5, 0.51]]) # two two-dimensional samples
w = tf.Variable(initial_value=inputs)
b = tf.Variable(tf.zeros((2,)))
labels = tf.convert_to_tensor([0, 1])

def forward(inputs, labels, var_list):
    w, b = var_list
    logits = tf.matmul(inputs, w) + b
    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(
        labels=labels, logits=logits)
    return xentropy

# `xentropy` has two elements (gradients of tensor - gradient
# of each sample in a batch)
with tf.GradientTape() as g:
    xentropy = forward(inputs, labels, [w, b])
    reduced_xentropy = tf.reduce_mean(xentropy)
grads = g.gradient(xentropy, [w, b])
print(xentropy.numpy()) # [0.6881597  0.71584916]
print(grads[0].numpy()) # [[ 0.20586157 -0.20586154]
                        #  [ 0.2607238  -0.26072377]]

# `reduced_xentropy` is scalar (gradients of scalar)
with tf.GradientTape() as g:
    xentropy = forward(inputs, labels, [w, b])
    reduced_xentropy = tf.reduce_mean(xentropy)
grads_reduced = g.gradient(reduced_xentropy, [w, b])
print(reduced_xentropy.numpy()) # 0.70200443 <-- scalar
print(grads_reduced[0].numpy()) # [[ 0.10293078 -0.10293077]
                                #  [ 0.1303619  -0.13036188]]

如果计算批处理中每个元素的损失(xentropy),则每个变量的最终梯度将是批处理中每个样本的所有梯度之和(这很有意义)。

票数 5
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/60665006

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档