我试着用AdamOptimizer训练一个卷积神经网络(这个模型是由VGG-16启发的,它列在问题的末尾)。该网络生成图像嵌入(将图像转换为包含128个值的列表)。
直到现在,我在所有的实验中都用0.0001作为学习速率(这给了我丢失和准确的正常值)。
当我使用0.1和0.0 1等高学习率时,一切都变得疯狂起来。
我得到的结果是:
epoch 0 loss 0.19993 acc 57.42
nr_test_examples 512
total_batch_test 1
TEST epoch 0 loss 5313544259158016.00000 acc 58.20
Test Aitor nr poze in plus 751
epoch 1 loss 20684906328883200.00000 acc 0.00
nr_test_examples 512
total_batch_test 1
TEST epoch 1 loss 1135694416986112.00000 acc 51.56
Test Aitor nr poze in plus 1963
epoch 2 loss 2697752092246016.00000 acc 0.00
nr_test_examples 512
total_batch_test 1
TEST epoch 2 loss 53017830782.00000 acc 52.73
Test Aitor nr poze in plus 1977
epoch 3 loss 128667078418.00000 acc 0.00
nr_test_examples 512
total_batch_test 1
TEST epoch 3 loss 757709846097920.00000 acc 52.34 模型返回的嵌入参数大小随损失的增加而增大。。
关于损失0.1:
[[ 1.29028062e+22 2.76679972e+22 -1.60350428e+22 -2.59803047e+22
-7.18799158e+21 3.79426737e+22 6.16485875e+21 5.25694511e+22
1.88533167e+22 2.83884797e+21 8.02921163e+21 -9.36909501e+21
-1.44595632e+22 -2.42238243e+22 2.02972577e+21 1.05234577e+22
-1.80612585e+22 -4.78811634e+22 1.49373501e+22 5.06000855e+22
3.70631387e+22 1.84049113e+22 -3.99712842e+22 3.87442379e+22
1.75347753e+22 5.92351884e+22 -3.53815667e+22 -1.82951788e+22
-6.43566909e+22 2.47560282e+22 5.30715552e+21 1.83587696e+22
-7.92202990e+21 1.67361902e+22 8.59540559e+20 -3.81585403e+22
-1.21638398e+22 4.17503997e+22 -1.22125473e+22 2.79304332e+22
-4.56848209e+22 1.57062125e+22 -2.50028311e+21 -2.62136002e+22
4.54086438e+21 -1.56374639e+22 -9.88864603e+21 -4.41802088e+22
-1.34634863e+22 5.70279618e+21 2.03487718e+22 -2.43145786e+22
3.17775273e+22 -1.20715622e+22 2.58878188e+22 5.10632087e+22
4.19953009e+22 3.96467818e+22 -1.04965802e+22 3.02379628e+22
-5.25661860e+22 3.07441015e+21 -5.18819518e+21 2.95340929e+22
1.14506092e+22 1.15907500e+22 6.69119500e+21 3.77412660e+22
-3.94501085e+21 1.33659958e+22 -1.60639323e+22 4.13619597e+22
2.68251817e+21 6.45229424e+21 -2.73042746e+21 4.42164447e+22
2.80798401e+22 -1.88889266e+22 4.13956748e+21 3.89647612e+21
-3.97987648e+22 3.42041704e+22 -7.92604683e+20 6.57421467e+22
-8.36352284e+21 -3.10638036e+22 4.72475508e+21 -1.85049497e+22
-2.01018620e+22 -4.16415747e+22 -1.26361030e+22 3.21139147e+22
9.59236321e+21 1.88358765e+22 -1.30287966e+22 -7.88201598e+21
3.74658596e+22 -1.73451794e+22 3.64240847e+22 3.83275750e+21
3.18538926e+22 -2.88709469e+22 -3.58837879e+22 -8.98292556e+20
1.61682176e+22 -4.03502305e+22 1.66714803e+22 -1.75002721e+22
1.72512196e+22 1.00159954e+22 1.31722408e+22 -6.84561825e+22
1.55648918e+22 1.01815039e+22 2.80281495e+21 2.46405536e+22
-3.38236179e+21 -4.50928036e+21 -3.56030898e+22 3.63372148e+22
-2.91085715e+21 1.96335417e+22 -9.57801362e+21 4.60519886e+21
2.86536550e+22 3.00846580e+22 8.66609606e+21 8.57120803e+21]]关于损失0.01:
[[ 135379.078125 427807.0625 -211165.5 -270527.875
263263.46875 61203.9765625 243880.703125 -134595.53125
65044.28125 -133903.921875 -326986.875 -346536.375 349003.
-138743.328125 440702.1875 -108623.6484375 73725.84375
-140035.90625 -357855.75 338021.65625 247224.15625
-85308.8515625 -511153.90625 206612.296875 -317970.0625
-95346.1796875 -24617.36523438 -369452.21875 -477215.0625
-154431.234375 281639.625 -387593.4375 96041.2109375
-184906.59375 107803.296875 74392.546875 463264.78125
239308.84375 743635.375 -40640.921875 6956.1953125
284925.75 -649819.3125 -295953.34375 38507.95703125
35773.08984375 214856.546875 -289618.78125 381939.90625
-68496.5546875 418068.46875 627032.625 182973.40625
119805.296875 14911.890625 475292.40625 -265693.125
-416467.28125 -354252.125 -162428.90625 336221.15625
41771.5625 -395673.09375 149899.5 -86771.7421875
-84667.2890625 -299950.8125 537230.5625 -138381.921875
294517.21875 92734.6015625 26118.45898438 380978.34375
-524781.9375 -150150.921875 563931.875 212278.8125
-156267.859375 -7298.81445312 -546963.125 155122.828125
-41295.8359375 46307.93359375 -128129.0546875 36079.36328125
-460227.65625 123968.7734375 728651.4375 252526.984375
-126041.7734375 265436. -74924.703125 244991.8125
38667.71875 -29434.65429688 374994.15625 -146754.859375
180715.015625 95923.5078125 479208.21875 333908.5
132672.703125 -402727.09375 -425125.03125 -68114.1640625
122268.4375 -308014.96875 473961.40625 370820.125
-502812.3125 201727.015625 156381.46875 337941.125
-291394.9375 273098.71875 -91102.7421875 64342.390625
-316238.625 291803.21875 -413403.4375 207456.203125
106696.90625 -274239.90625 266393.65625 50893.91015625
149943.265625 -100018.765625 -283917.65625 ]]损失了0.0001,我得到了很小的价值。
我正在积累渐变(因为我没有足够的资源一次通过大批):
tvs = tf.trainable_variables() ## Retrieve all trainable variables you defined in your graph
global_step = tf.Variable(0, trainable=False)
starter_learning_rate = 0.01
learning_rate = tf.train.exponential_decay(starter_learning_rate, global_step, 25600, 0.96, staircase=True)
opt = tf.train.AdamOptimizer(learning_rate)
## Creation of a list of variables with the same shape as the trainable ones
# initialized with 0s
accum_vars = [tf.Variable(tf.zeros_like(tv.initialized_value()), trainable=False) for tv in tvs]
zero_ops = [tv.assign(tf.zeros_like(tv)) for tv in accum_vars]
## Calls the compute_gradients function of the optimizer to obtain... the list of gradients
gvs = opt.compute_gradients(cost, tvs);
## Adds to each element from the list you initialized earlier with zeros its gradient (works because accum_vars and gvs are in the same order)
accum_ops = [accum_vars[i].assign_add(gv[0]) for i, gv in enumerate(gvs)]
train_step = opt.apply_gradients([(accum_vars[i], gv[1]) for i, gv in enumerate(gvs)], global_step=global_step)为什么会发生这种事?
这是我的模型:
def siamese_convnet(x):
w_conv1_1 = tf.get_variable(name='w_conv1_1', initializer=tf.contrib.layers.xavier_initializer(), shape=[3, 3, 1, 64])
w_conv1_2 = tf.get_variable(name='w_conv1_2', initializer=tf.contrib.layers.xavier_initializer(), shape=[3, 3, 64, 64])
w_conv2_1 = tf.get_variable(name='w_conv2_1', initializer=tf.contrib.layers.xavier_initializer(), shape=[3, 3, 64, 128])
w_conv2_2 = tf.get_variable(name='w_conv2_2', initializer=tf.contrib.layers.xavier_initializer(), shape=[3, 3, 128, 128])
w_conv3_1 = tf.get_variable(name='w_conv3_1', initializer=tf.contrib.layers.xavier_initializer(), shape=[3, 3, 128, 256])
w_conv3_2 = tf.get_variable(name='w_conv3_2', initializer=tf.contrib.layers.xavier_initializer(), shape=[3, 3, 256, 256])
w_conv3_3 = tf.get_variable(name='w_conv3_3', initializer=tf.contrib.layers.xavier_initializer(), shape=[3, 3, 256, 256])
w_conv4_1 = tf.get_variable(name='w_conv4_1', initializer=tf.contrib.layers.xavier_initializer(), shape=[3, 3, 256, 512])
w_conv4_2 = tf.get_variable(name='w_conv4_2', initializer=tf.contrib.layers.xavier_initializer(), shape=[3, 3, 512, 512])
w_conv4_3 = tf.get_variable(name='w_conv4_3', initializer=tf.contrib.layers.xavier_initializer(), shape=[1, 1, 512, 512])
w_conv5_1 = tf.get_variable(name='w_conv5_1', initializer=tf.contrib.layers.xavier_initializer(), shape=[3, 3, 512, 512])
w_conv5_2 = tf.get_variable(name='w_conv5_2', initializer=tf.contrib.layers.xavier_initializer(), shape=[3, 3, 512, 512])
w_conv5_3 = tf.get_variable(name='w_conv5_3', initializer=tf.contrib.layers.xavier_initializer(), shape=[1, 1, 512, 512])
w_fc_1 = tf.get_variable(name='w_fc_1', initializer=tf.contrib.layers.xavier_initializer(), shape=[5*5*512, 2048])
w_fc_2 = tf.get_variable(name='w_fc_2', initializer=tf.contrib.layers.xavier_initializer(), shape=[2048, 1024])
w_out = tf.get_variable(name='w_out', initializer=tf.contrib.layers.xavier_initializer(), shape=[1024, 128])
bias_conv1_1 = tf.get_variable(name='bias_conv1_1', initializer=tf.constant(0.01, shape=[64]))
bias_conv1_2 = tf.get_variable(name='bias_conv1_2', initializer=tf.constant(0.01, shape=[64]))
bias_conv2_1 = tf.get_variable(name='bias_conv2_1', initializer=tf.constant(0.01, shape=[128]))
bias_conv2_2 = tf.get_variable(name='bias_conv2_2', initializer=tf.constant(0.01, shape=[128]))
bias_conv3_1 = tf.get_variable(name='bias_conv3_1', initializer=tf.constant(0.01, shape=[256]))
bias_conv3_2 = tf.get_variable(name='bias_conv3_2', initializer=tf.constant(0.01, shape=[256]))
bias_conv3_3 = tf.get_variable(name='bias_conv3_3', initializer=tf.constant(0.01, shape=[256]))
bias_conv4_1 = tf.get_variable(name='bias_conv4_1', initializer=tf.constant(0.01, shape=[512]))
bias_conv4_2 = tf.get_variable(name='bias_conv4_2', initializer=tf.constant(0.01, shape=[512]))
bias_conv4_3 = tf.get_variable(name='bias_conv4_3', initializer=tf.constant(0.01, shape=[512]))
bias_conv5_1 = tf.get_variable(name='bias_conv5_1', initializer=tf.constant(0.01, shape=[512]))
bias_conv5_2 = tf.get_variable(name='bias_conv5_2', initializer=tf.constant(0.01, shape=[512]))
bias_conv5_3 = tf.get_variable(name='bias_conv5_3', initializer=tf.constant(0.01, shape=[512]))
bias_fc_1 = tf.get_variable(name='bias_fc_1', initializer=tf.constant(0.01, shape=[2048]))
bias_fc_2 = tf.get_variable(name='bias_fc_2', initializer=tf.constant(0.01, shape=[1024]))
''''bias_fc = tf.get_variable(name='bias_fc', initializer=tf.zeros([1024]))'''
out = tf.get_variable(name='out', initializer=tf.constant(0.01, shape=[128]))
x = tf.reshape(x , [-1, 160, 160, 1]);
conv1_1 = tf.nn.relu(conv2d(x, w_conv1_1) + bias_conv1_1);
conv1_2= tf.nn.relu(conv2d(conv1_1, w_conv1_2) + bias_conv1_2);
max_pool1 = max_pool(conv1_2);
#max_pool1 = tf.nn.dropout(max_pool1, keep_rate)
conv2_1 = tf.nn.relu( conv2d(max_pool1, w_conv2_1) + bias_conv2_1 );
conv2_2 = tf.nn.relu( conv2d(conv2_1, w_conv2_2) + bias_conv2_2 );
max_pool2 = max_pool(conv2_2)
#max_pool2 = tf.nn.dropout(max_pool2, keep_rate)
conv3_1 = tf.nn.relu( conv2d(max_pool2, w_conv3_1) + bias_conv3_1 );
conv3_2 = tf.nn.relu( conv2d(conv3_1, w_conv3_2) + bias_conv3_2 );
conv3_3 = tf.nn.relu( conv2d(conv3_2, w_conv3_3) + bias_conv3_3 );
max_pool3 = max_pool(conv3_3)
#max_pool3 = tf.nn.dropout(max_pool3, keep_rate)
conv4_1 = tf.nn.relu( conv2d(max_pool3, w_conv4_1) + bias_conv4_1 );
conv4_2 = tf.nn.relu( conv2d(conv4_1, w_conv4_2) + bias_conv4_2 );
conv4_3 = tf.nn.relu( conv2d(conv4_2, w_conv4_3) + bias_conv4_3 );
max_pool4 = max_pool(conv4_3)
#max_pool4 = tf.nn.dropout(max_pool4, keep_rate)
conv5_1 = tf.nn.relu( conv2d(max_pool4, w_conv5_1) + bias_conv5_1 );
conv5_2 = tf.nn.relu( conv2d(conv5_1, w_conv5_2) + bias_conv5_2 );
conv5_3 = tf.nn.relu( conv2d(conv5_2, w_conv5_3) + bias_conv5_3 );
max_pool5 = max_pool(conv5_3)
#max_pool5 = tf.nn.dropout(max_pool5, keep_rate)
fc_helper = tf.reshape(max_pool5, [-1, 5*5*512]);
fc_1 = tf.nn.relu( tf.matmul(fc_helper, w_fc_1) + bias_fc_1 );
#fc_1 = tf.nn.dropout(fc_1, keep_rate)
fc_2 = tf.nn.relu( tf.matmul(fc_1, w_fc_2) + bias_fc_2 );
#fc_2 = tf.nn.dropout(fc_2, 0.7)
'''fc = tf.nn.relu( tf.matmul(fc_1, fc_layer) + bias_fc );
fc = tf.nn.dropout(fc, keep_rate)
output = tf.matmul(fc, w_out) + out'''
output = tf.matmul(fc_2, w_out) + out
#output = tf.nn.l2_normalize(output, 0)
return output发布于 2017-12-12 08:17:18
反向传播使用梯度下降来确定在模型中改变权重的程度。这些变化乘以学习率。当学习率很大时,权重的变化也很大;如果学习率太大,最终可能会远离原来的位置,成本函数可能会大大恶化。你的学习速度也可能太小,在这种情况下,你的模型需要很长时间才能改进。
一般来说,如果你的成本正在膨胀,你的学习率可能太高了。
https://stackoverflow.com/questions/47714193
复制相似问题