背景
在Tensorflow 2中,存在一个名为GradientTape的类,它用于记录张量上的运算,其结果可以被区分并输入到某种极小化算法中。例如,从文件中我们有一个例子:
x = tf.constant(3.0)
with tf.GradientTape() as g:
g.watch(x)
y = x * x
dy_dx = g.gradient(y, x) # Will compute to 6.0文档串用于gradient方法意味着第一个参数不仅可以是张量,还可以是张量列表:
def gradient(self,
target,
sources,
output_gradients=None,
unconnected_gradients=UnconnectedGradients.NONE):
"""Computes the gradient using operations recorded in context of this tape.
Args:
target: a list or nested structure of Tensors or Variables to be
differentiated.
sources: a list or nested structure of Tensors or Variables. `target`
will be differentiated against elements in `sources`.
output_gradients: a list of gradients, one for each element of
target. Defaults to None.
unconnected_gradients: a value which can either hold 'none' or 'zero' and
alters the value which will be returned if the target and sources are
unconnected. The possible values and effects are detailed in
'UnconnectedGradients' and it defaults to 'none'.
Returns:
a list or nested structure of Tensors (or IndexedSlices, or None),
one for each element in `sources`. Returned structure is the same as
the structure of `sources`.
Raises:
RuntimeError: if called inside the context of the tape, or if called more
than once on a non-persistent tape.
ValueError: if the target is a variable or if unconnected gradients is
called with an unknown value.
"""在上面的例子中,很容易看到y,target,是要区分的函数,x是因变量--相对于“梯度”。
根据我有限的经验,gradient方法似乎返回一个张量列表,每个sources元素都返回一个张量,而每个梯度都是与sources的对应成员形状相同的张量。
问题
如果gradients包含要区分的单个1x1“张量”,则上面对target行为的描述是有意义的,因为从数学上讲,梯度向量应该与函数的域相同。
但是,如果target是张量列表,则gradients的输出仍然是相同的形状。为什么是这种情况?如果target被认为是一个函数列表,那么输出不应该类似于Jacobian吗?我该如何从概念上解释这种行为呢?
发布于 2020-03-18 17:14:37
tf.GradientTape().gradient()就是这样定义的。它具有与tf.gradients()相同的功能,但后者不能在急切模式下使用。来自文档 of tf.gradients()
它返回长度为
len(xs)的张量列表,其中每个张量都是sum(dy/dx) for y in ys
其中xs是sources,ys是target。
示例1
让我们假设target = [y1, y2]和sources = [x1, x2]。其结果将是:
[dy1/dx1 + dy2/dx1, dy1/dx2 + dy2/dx2]示例2
计算每个样本损耗(张量)与减少损失(标量)的梯度。
Let w, b be two variables.
xentropy = [y1, y2] # tensor
reduced_xentropy = 0.5 * (y1 + y2) # scalar
grads = [dy1/dw + dy2/dw, dy1/db + dy2/db]
reduced_grads = [d(reduced_xentropy)/dw, d(reduced_xentropy)/db]
= [d(0.5 * (y1 + y2))/dw, d(0.5 * (y1 + y2))/db]
== 0.5 * grads以上片段的Tensorflow示例:
import tensorflow as tf
print(tf.__version__) # 2.1.0
inputs = tf.convert_to_tensor([[0.1, 0], [0.5, 0.51]]) # two two-dimensional samples
w = tf.Variable(initial_value=inputs)
b = tf.Variable(tf.zeros((2,)))
labels = tf.convert_to_tensor([0, 1])
def forward(inputs, labels, var_list):
w, b = var_list
logits = tf.matmul(inputs, w) + b
xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(
labels=labels, logits=logits)
return xentropy
# `xentropy` has two elements (gradients of tensor - gradient
# of each sample in a batch)
with tf.GradientTape() as g:
xentropy = forward(inputs, labels, [w, b])
reduced_xentropy = tf.reduce_mean(xentropy)
grads = g.gradient(xentropy, [w, b])
print(xentropy.numpy()) # [0.6881597 0.71584916]
print(grads[0].numpy()) # [[ 0.20586157 -0.20586154]
# [ 0.2607238 -0.26072377]]
# `reduced_xentropy` is scalar (gradients of scalar)
with tf.GradientTape() as g:
xentropy = forward(inputs, labels, [w, b])
reduced_xentropy = tf.reduce_mean(xentropy)
grads_reduced = g.gradient(reduced_xentropy, [w, b])
print(reduced_xentropy.numpy()) # 0.70200443 <-- scalar
print(grads_reduced[0].numpy()) # [[ 0.10293078 -0.10293077]
# [ 0.1303619 -0.13036188]]如果计算批处理中每个元素的损失(xentropy),则每个变量的最终梯度将是批处理中每个样本的所有梯度之和(这很有意义)。
https://stackoverflow.com/questions/60665006
复制相似问题