首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >无法让GradientTape提供非空结果

无法让GradientTape提供非空结果
EN

Stack Overflow用户
提问于 2019-09-09 23:34:17
回答 1查看 77关注 0票数 0

我正在尝试使用tensorflow2手动实现一个非常简单的RNN。我在tensorflow网站上的the example to manually make models上模拟了我的代码。为了达到这个目的,代码被简化为最基本的部分,如下所示

代码语言:javascript
复制
class ModelSimple(object):
    def __init__(self):
        # Initialize the weights to `5.0` and the bias to `0.0`
        # In practice, these should be initialized to random values (for example, with `tf.random.normal`)
        self.W = tf.Variable(tf.random.normal([]))
        self.b = tf.Variable(tf.random.normal([]))

    def __call__(self, x):
        return self.W * x + self.b

def loss(predicted_y, target_y):
    return tf.reduce_mean(tf.square(predicted_y - target_y))


NUM_EXAMPLES = 1000

inputs  = tf.random.normal(shape=[NUM_EXAMPLES])
outputs = tf.zeros(NUM_EXAMPLES)
model = ModelSimple()

with tf.GradientTape() as t:
    t.watch([model.W,model.b])
    current_loss = loss(model(inputs), outputs)
dW, db = t.gradient(current_loss, [model.W, model.b])
print(dW,db)

这为dW和db提供了很好的张量。然后我试着做我上面描述的事情

代码语言:javascript
复制
class ModelRNN(object):
    def __init__(self, n_inputs, n_neurons):
        self.n_inputs = n_inputs
        self.n_neurons = n_neurons

        # weights for new input
        self.Wx = tf.Variable(tf.random.normal(shape=[self.n_inputs, self.n_neurons], dtype=tf.float32))

        # weights for previous output
        self.Wy = tf.Variable(tf.random.normal(shape=[self.n_neurons, self.n_neurons], dtype=tf.float32))

        # bias weights
        self.b = tf.Variable(tf.zeros([1, self.n_neurons], dtype=tf.float32))

    def __call__(self, X_batch):
        # get shape of input
        batch_size, num_time_steps, _ = X_batch.get_shape()

        # we will loop through the time steps and the output of the previous computation feeds into
        # the next one.
        # this variable keeps track of it and is initialized to zero
        y_last = tf.Variable(tf.zeros([batch_size, self.n_neurons], dtype=tf.float32))

        # the outputs will be stored in this tensor
        Ys = tf.Variable(tf.zeros([batch_size, num_time_steps, self.n_neurons], dtype=tf.float32))

        for t in range(num_time_steps):
            Xt = X_batch[:, t, :]
            yt = tf.tanh(tf.matmul(y_last, self.Wy) +
                         tf.matmul(Xt, self.Wx) +
                         self.b)
            y_last.assign(yt)
            Ys[:, t, :].assign(yt)

        return Ys




inputs = tf.convert_to_tensor(np.array([
        # t = 0      t = 1
        [[0, 1, 2], [9, 8, 7]], # instance 1
        [[3, 4, 5], [0, 0, 0]], # instance 2
        [[6, 7, 8], [6, 5, 4]], # instance 3
        [[9, 0, 1], [3, 2, 1]], # instance 4
    ],dtype=np.float32))
outputs=tf.Variable(tf.zeros((4,2,5),dtype=np.float32))

model = ModelRNN(3, 5)

with tf.GradientTape() as t:
    t.watch([model.Wx,model.Wy,model.b])
    current_loss = loss(model(inputs), outputs)

dWx,dWy,db = t.gradient(current_loss, [model.Wx, model.Wy,model.b])
print(dWx,dWy,db)

结果是dWx,dWy,db都是空的。我尝试了几种方法(包括观察它们使用GradientTape,尽管它们是变量),但是我一直没有得到任何结果。我做错了什么?

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2019-09-10 18:15:03

看起来这与这个问题有关:Tensorflow cannot get gradient wrt a Variable, but can wrt a Tensor

用python列表和tf.stack替换assign将导致返回渐变

代码语言:javascript
复制
    Ys = []
    for t in range(num_time_steps):
        Xt = X_batch[:, t, :]
        yt = tf.tanh(tf.matmul(y_last, self.Wy) +
                     tf.matmul(Xt, self.Wx) +
                     self.b)
        y_last.assign(yt)
        Ys.append(yt)

    return tf.stack(Ys,axis=1)
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/57857109

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档