我对神经网络中有关批处理大小、时间和过程中权重分布的术语感到困惑。
我想验证我对流程的理解是否按照以下顺序有效?
Considering one training/data point has 8 features(8 input nodes).
I have 20 training/data points.
I choose batch size of 2.
Now I want to make the model learn.执行第一个时代
Executing first batch
Data point-1 :8 feature's values go through the 8 input nodes.
Random weights are initialised
Forward Propagation happens
Backward Propagation happens
The result of backward propagation-all the weights are updated.
Data point-2 :8 feature's values go through the 8 input nodes.
Forward propagation happens with the updated weights found from the previous(aka Data point-1) back propagation result.
Backward propagation happens and all the weights are again updated.
Executing second batch
Data point-3 :8 features go through the 8 input nodes.
Forward propagation happens with the updated nodes found from the previous(aka Data point-2) back propagation result
Backward propagation happens and all the weights are again updated.
This process continues………….until the first epoch ends执行第二个时代
Executing the second batch
Data point-1: 8 feature's value go through 8 input nodes.
No random weights this time. Forward propagation happens with the last back-propagated values found(from the First Epoch, last executed batch)
Backward propagation happens and all the weights are again updated.
This process continues.. until the second epoch ends.这一过程一直持续到所希望的时代。
发布于 2018-05-18 12:39:49
mini-batch处理是错误的:对于一个批处理,我们一次计算整个批的梯度,然后我们对所有的梯度进行求和,然后每批更新一次权重。
下面的代码说明了简单示例的梯度计算d(loss)/d(W):y = W * x用于mini-batch和single输入:
X = tf.placeholder(tf.float32, [None, 1])
Y = tf.placeholder(tf.float32, [None, 1])
W1 = tf.constant([[0.2]], dtype=tf.float32)
out = tf.matmul(X, W1)
loss = tf.square(out-Y)
#Calculate error gradient with respect to weights.
gradients = tf.gradients(loss, W1)[0]
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
#Giving individual inputs
print(sess.run([gradients], {X: np.array([[0.1]]), Y:[[0.05]]}))
# [[-0.006]]
print(sess.run([gradients], {X: np.array([[0.2]]), Y:[[0.1]]}))
#[[-0.024]]
#Give a batch combining the above inputs
print(sess.run([gradients], {X: np.array([[0.1],[0.2]]), Y:[[0.05], [0.1]]}))
# [[-0.03]] which is the sum of the above gradients.发布于 2018-05-18 09:24:33
您提到的步骤是用于随机梯度下降,其中批大小没有播放任何role..as,权重在每个数据点之后更新,并用于下一个数据点的评估。
对于批处理size=2这样的最小批处理场景,它应该计算新的权重(通过后端)。一起使用,然后在下一批(尺寸2)中使用,然后继续使用,直到所有批次达到顶点为止。你提到的其他事情都是正确的。
发布于 2018-05-18 09:33:48
几乎所有事情都是正确的,但是反向传播权更新。对小批中的每个样本计算误差,但只有在小批中的所有样本经过前向传播后才会更新权重。您可以阅读更多关于它的这里。
https://stackoverflow.com/questions/50403971
复制相似问题