所以我正在尝试在tensorflow中实现DQN算法,并且我已经定义了如下所示的损失函数,但是每当我使用ADAM优化器执行权重更新时,在2-3次更新之后,我所有的变量都变成了nan。知道问题出在哪里吗。我的操作可以取(0,10)之间的整数值。你知道我会怎么做吗?
def Q_Values_of_Given_State_Action(self, actions_, y_targets):
self.dense_output=self.dense_output #Output of the online network which given the Q values of all the actions in the current state
actions_=tf.reshape(tf.cast(actions_, tf.int32), shape=(Mini_batch,1)) #Actions which was taken by the online network
z=tf.reshape(tf.range(tf.shape(self.dense_output)[0]), shape=(Mini_batch,1) )
index_=tf.concat((z,actions_), axis=-1)
self.Q_Values_Select_Actions=tf.gather_nd(self.dense_output, index_)
self.loss_=tf.divide((tf.reduce_sum (tf.square(self.Q_Values_Select_Actions-y_targets))), 2)
return self.loss_https://stackoverflow.com/questions/49544277
复制相似问题