首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >学习率越高,权重越大。

学习率越高,权重越大。
EN

Stack Overflow用户
提问于 2017-12-08 12:13:07
回答 1查看 70关注 0票数 0

我试着用AdamOptimizer训练一个卷积神经网络(这个模型是由VGG-16启发的,它列在问题的末尾)。该网络生成图像嵌入(将图像转换为包含128个值的列表)。

直到现在,我在所有的实验中都用0.0001作为学习速率(这给了我丢失和准确的正常值)。

当我使用0.10.0 1等高学习率时,一切都变得疯狂起来。

我得到的结果是:

代码语言:javascript
复制
epoch 0  loss 0.19993 acc 57.42 
nr_test_examples 512
total_batch_test 1
TEST epoch 0  loss 5313544259158016.00000 acc 58.20 
Test Aitor nr poze in plus 751
epoch 1  loss 20684906328883200.00000 acc 0.00 
nr_test_examples 512
total_batch_test 1
TEST epoch 1  loss 1135694416986112.00000 acc 51.56 
Test Aitor nr poze in plus 1963
epoch 2  loss 2697752092246016.00000 acc 0.00 
nr_test_examples 512
total_batch_test 1
TEST epoch 2  loss 53017830782.00000 acc 52.73 
Test Aitor nr poze in plus 1977
epoch 3  loss 128667078418.00000 acc 0.00 
nr_test_examples 512
total_batch_test 1
TEST epoch 3  loss 757709846097920.00000 acc 52.34 

模型返回的嵌入参数大小随损失的增加而增大。

关于损失0.1:

代码语言:javascript
复制
[[  1.29028062e+22   2.76679972e+22  -1.60350428e+22  -2.59803047e+22
   -7.18799158e+21   3.79426737e+22   6.16485875e+21   5.25694511e+22
    1.88533167e+22   2.83884797e+21   8.02921163e+21  -9.36909501e+21
   -1.44595632e+22  -2.42238243e+22   2.02972577e+21   1.05234577e+22
   -1.80612585e+22  -4.78811634e+22   1.49373501e+22   5.06000855e+22
    3.70631387e+22   1.84049113e+22  -3.99712842e+22   3.87442379e+22
    1.75347753e+22   5.92351884e+22  -3.53815667e+22  -1.82951788e+22
   -6.43566909e+22   2.47560282e+22   5.30715552e+21   1.83587696e+22
   -7.92202990e+21   1.67361902e+22   8.59540559e+20  -3.81585403e+22
   -1.21638398e+22   4.17503997e+22  -1.22125473e+22   2.79304332e+22
   -4.56848209e+22   1.57062125e+22  -2.50028311e+21  -2.62136002e+22
    4.54086438e+21  -1.56374639e+22  -9.88864603e+21  -4.41802088e+22
   -1.34634863e+22   5.70279618e+21   2.03487718e+22  -2.43145786e+22
    3.17775273e+22  -1.20715622e+22   2.58878188e+22   5.10632087e+22
    4.19953009e+22   3.96467818e+22  -1.04965802e+22   3.02379628e+22
   -5.25661860e+22   3.07441015e+21  -5.18819518e+21   2.95340929e+22
    1.14506092e+22   1.15907500e+22   6.69119500e+21   3.77412660e+22
   -3.94501085e+21   1.33659958e+22  -1.60639323e+22   4.13619597e+22
    2.68251817e+21   6.45229424e+21  -2.73042746e+21   4.42164447e+22
    2.80798401e+22  -1.88889266e+22   4.13956748e+21   3.89647612e+21
   -3.97987648e+22   3.42041704e+22  -7.92604683e+20   6.57421467e+22
   -8.36352284e+21  -3.10638036e+22   4.72475508e+21  -1.85049497e+22
   -2.01018620e+22  -4.16415747e+22  -1.26361030e+22   3.21139147e+22
    9.59236321e+21   1.88358765e+22  -1.30287966e+22  -7.88201598e+21
    3.74658596e+22  -1.73451794e+22   3.64240847e+22   3.83275750e+21
    3.18538926e+22  -2.88709469e+22  -3.58837879e+22  -8.98292556e+20
    1.61682176e+22  -4.03502305e+22   1.66714803e+22  -1.75002721e+22
    1.72512196e+22   1.00159954e+22   1.31722408e+22  -6.84561825e+22
    1.55648918e+22   1.01815039e+22   2.80281495e+21   2.46405536e+22
   -3.38236179e+21  -4.50928036e+21  -3.56030898e+22   3.63372148e+22
   -2.91085715e+21   1.96335417e+22  -9.57801362e+21   4.60519886e+21
    2.86536550e+22   3.00846580e+22   8.66609606e+21   8.57120803e+21]]

关于损失0.01:

代码语言:javascript
复制
[[ 135379.078125    427807.0625     -211165.5        -270527.875
   263263.46875      61203.9765625   243880.703125   -134595.53125
    65044.28125    -133903.921875   -326986.875      -346536.375       349003.
  -138743.328125    440702.1875     -108623.6484375    73725.84375
  -140035.90625    -357855.75        338021.65625     247224.15625
   -85308.8515625  -511153.90625     206612.296875   -317970.0625
   -95346.1796875   -24617.36523438 -369452.21875    -477215.0625
  -154431.234375    281639.625      -387593.4375       96041.2109375
  -184906.59375     107803.296875     74392.546875    463264.78125
   239308.84375     743635.375       -40640.921875      6956.1953125
   284925.75       -649819.3125     -295953.34375      38507.95703125
    35773.08984375  214856.546875   -289618.78125     381939.90625
   -68496.5546875   418068.46875     627032.625       182973.40625
   119805.296875     14911.890625    475292.40625    -265693.125
  -416467.28125    -354252.125      -162428.90625     336221.15625
    41771.5625     -395673.09375     149899.5         -86771.7421875
   -84667.2890625  -299950.8125      537230.5625     -138381.921875
   294517.21875      92734.6015625    26118.45898438  380978.34375
  -524781.9375     -150150.921875    563931.875       212278.8125
  -156267.859375     -7298.81445312 -546963.125       155122.828125
   -41295.8359375    46307.93359375 -128129.0546875    36079.36328125
  -460227.65625     123968.7734375   728651.4375      252526.984375
  -126041.7734375   265436.          -74924.703125    244991.8125
    38667.71875     -29434.65429688  374994.15625    -146754.859375
   180715.015625     95923.5078125   479208.21875     333908.5
   132672.703125   -402727.09375    -425125.03125     -68114.1640625
   122268.4375     -308014.96875     473961.40625     370820.125
  -502812.3125      201727.015625    156381.46875     337941.125
  -291394.9375      273098.71875     -91102.7421875    64342.390625
  -316238.625       291803.21875    -413403.4375      207456.203125
   106696.90625    -274239.90625     266393.65625      50893.91015625
   149943.265625   -100018.765625   -283917.65625   ]]

损失了0.0001,我得到了很小的价值。

我正在积累渐变(因为我没有足够的资源一次通过大批):

代码语言:javascript
复制
    tvs = tf.trainable_variables() ## Retrieve all trainable variables you defined in your graph

    global_step = tf.Variable(0, trainable=False)
    starter_learning_rate = 0.01
    learning_rate = tf.train.exponential_decay(starter_learning_rate, global_step, 25600, 0.96, staircase=True)

    opt = tf.train.AdamOptimizer(learning_rate)

    ## Creation of a list of variables with the same shape as the trainable ones
    # initialized with 0s
    accum_vars = [tf.Variable(tf.zeros_like(tv.initialized_value()), trainable=False) for tv in tvs]                                        
    zero_ops = [tv.assign(tf.zeros_like(tv)) for tv in accum_vars]

    ## Calls the compute_gradients function of the optimizer to obtain... the list of gradients
    gvs = opt.compute_gradients(cost, tvs);

    ## Adds to each element from the list you initialized earlier with zeros its gradient (works because accum_vars and gvs are in the same order)
    accum_ops = [accum_vars[i].assign_add(gv[0]) for i, gv in enumerate(gvs)]

    train_step = opt.apply_gradients([(accum_vars[i], gv[1]) for i, gv in enumerate(gvs)], global_step=global_step)

为什么会发生这种事?

这是我的模型:

代码语言:javascript
复制
def siamese_convnet(x):


    w_conv1_1 = tf.get_variable(name='w_conv1_1', initializer=tf.contrib.layers.xavier_initializer(), shape=[3, 3, 1, 64])
    w_conv1_2 = tf.get_variable(name='w_conv1_2', initializer=tf.contrib.layers.xavier_initializer(), shape=[3, 3, 64, 64])

    w_conv2_1 = tf.get_variable(name='w_conv2_1', initializer=tf.contrib.layers.xavier_initializer(), shape=[3, 3, 64, 128])
    w_conv2_2 = tf.get_variable(name='w_conv2_2', initializer=tf.contrib.layers.xavier_initializer(), shape=[3, 3, 128, 128])

    w_conv3_1 = tf.get_variable(name='w_conv3_1', initializer=tf.contrib.layers.xavier_initializer(), shape=[3, 3, 128, 256])
    w_conv3_2 = tf.get_variable(name='w_conv3_2', initializer=tf.contrib.layers.xavier_initializer(), shape=[3, 3, 256, 256])
    w_conv3_3 = tf.get_variable(name='w_conv3_3', initializer=tf.contrib.layers.xavier_initializer(), shape=[3, 3, 256, 256])

    w_conv4_1 = tf.get_variable(name='w_conv4_1', initializer=tf.contrib.layers.xavier_initializer(), shape=[3, 3, 256, 512])
    w_conv4_2 = tf.get_variable(name='w_conv4_2', initializer=tf.contrib.layers.xavier_initializer(), shape=[3, 3, 512, 512])
    w_conv4_3 = tf.get_variable(name='w_conv4_3', initializer=tf.contrib.layers.xavier_initializer(), shape=[1, 1, 512, 512])

    w_conv5_1 = tf.get_variable(name='w_conv5_1', initializer=tf.contrib.layers.xavier_initializer(), shape=[3, 3, 512, 512])
    w_conv5_2 = tf.get_variable(name='w_conv5_2', initializer=tf.contrib.layers.xavier_initializer(), shape=[3, 3, 512, 512])
    w_conv5_3 = tf.get_variable(name='w_conv5_3', initializer=tf.contrib.layers.xavier_initializer(), shape=[1, 1, 512, 512])

    w_fc_1 = tf.get_variable(name='w_fc_1', initializer=tf.contrib.layers.xavier_initializer(), shape=[5*5*512, 2048])
    w_fc_2 = tf.get_variable(name='w_fc_2', initializer=tf.contrib.layers.xavier_initializer(), shape=[2048, 1024])


    w_out = tf.get_variable(name='w_out', initializer=tf.contrib.layers.xavier_initializer(), shape=[1024, 128])

    bias_conv1_1 = tf.get_variable(name='bias_conv1_1', initializer=tf.constant(0.01, shape=[64]))
    bias_conv1_2 = tf.get_variable(name='bias_conv1_2', initializer=tf.constant(0.01, shape=[64]))

    bias_conv2_1 = tf.get_variable(name='bias_conv2_1', initializer=tf.constant(0.01, shape=[128]))
    bias_conv2_2 = tf.get_variable(name='bias_conv2_2', initializer=tf.constant(0.01, shape=[128]))

    bias_conv3_1 = tf.get_variable(name='bias_conv3_1', initializer=tf.constant(0.01, shape=[256]))
    bias_conv3_2 = tf.get_variable(name='bias_conv3_2', initializer=tf.constant(0.01, shape=[256]))
    bias_conv3_3 = tf.get_variable(name='bias_conv3_3', initializer=tf.constant(0.01, shape=[256]))

    bias_conv4_1 = tf.get_variable(name='bias_conv4_1', initializer=tf.constant(0.01, shape=[512]))
    bias_conv4_2 = tf.get_variable(name='bias_conv4_2', initializer=tf.constant(0.01, shape=[512]))
    bias_conv4_3 = tf.get_variable(name='bias_conv4_3', initializer=tf.constant(0.01, shape=[512]))

    bias_conv5_1 = tf.get_variable(name='bias_conv5_1', initializer=tf.constant(0.01, shape=[512]))
    bias_conv5_2 = tf.get_variable(name='bias_conv5_2', initializer=tf.constant(0.01, shape=[512]))
    bias_conv5_3 = tf.get_variable(name='bias_conv5_3', initializer=tf.constant(0.01, shape=[512]))

    bias_fc_1 = tf.get_variable(name='bias_fc_1', initializer=tf.constant(0.01, shape=[2048]))
    bias_fc_2 = tf.get_variable(name='bias_fc_2', initializer=tf.constant(0.01, shape=[1024]))

    ''''bias_fc = tf.get_variable(name='bias_fc', initializer=tf.zeros([1024]))'''
    out = tf.get_variable(name='out', initializer=tf.constant(0.01, shape=[128]))

    x = tf.reshape(x , [-1, 160, 160, 1]);

    conv1_1 = tf.nn.relu(conv2d(x, w_conv1_1) + bias_conv1_1);
    conv1_2= tf.nn.relu(conv2d(conv1_1, w_conv1_2) + bias_conv1_2);

    max_pool1 = max_pool(conv1_2);
    #max_pool1 = tf.nn.dropout(max_pool1, keep_rate)

    conv2_1 = tf.nn.relu( conv2d(max_pool1, w_conv2_1) + bias_conv2_1 );
    conv2_2 = tf.nn.relu( conv2d(conv2_1, w_conv2_2) + bias_conv2_2 );

    max_pool2 = max_pool(conv2_2)
    #max_pool2 = tf.nn.dropout(max_pool2, keep_rate)

    conv3_1 = tf.nn.relu( conv2d(max_pool2, w_conv3_1) + bias_conv3_1 );
    conv3_2 = tf.nn.relu( conv2d(conv3_1, w_conv3_2) + bias_conv3_2 );
    conv3_3 = tf.nn.relu( conv2d(conv3_2, w_conv3_3) + bias_conv3_3 );

    max_pool3 = max_pool(conv3_3)
    #max_pool3 = tf.nn.dropout(max_pool3, keep_rate)

    conv4_1 = tf.nn.relu( conv2d(max_pool3, w_conv4_1) + bias_conv4_1 );
    conv4_2 = tf.nn.relu( conv2d(conv4_1, w_conv4_2) + bias_conv4_2 );
    conv4_3 = tf.nn.relu( conv2d(conv4_2, w_conv4_3) + bias_conv4_3 );

    max_pool4 = max_pool(conv4_3)
    #max_pool4 = tf.nn.dropout(max_pool4, keep_rate)

    conv5_1 = tf.nn.relu( conv2d(max_pool4, w_conv5_1) + bias_conv5_1 );
    conv5_2 = tf.nn.relu( conv2d(conv5_1, w_conv5_2) + bias_conv5_2 );
    conv5_3 = tf.nn.relu( conv2d(conv5_2, w_conv5_3) + bias_conv5_3 );

    max_pool5 = max_pool(conv5_3)
    #max_pool5 = tf.nn.dropout(max_pool5, keep_rate)

    fc_helper = tf.reshape(max_pool5, [-1, 5*5*512]);
    fc_1 = tf.nn.relu( tf.matmul(fc_helper, w_fc_1) + bias_fc_1 );
    #fc_1 = tf.nn.dropout(fc_1, keep_rate)

    fc_2 = tf.nn.relu( tf.matmul(fc_1, w_fc_2) + bias_fc_2 );
    #fc_2 = tf.nn.dropout(fc_2, 0.7)

    '''fc = tf.nn.relu( tf.matmul(fc_1, fc_layer) + bias_fc );
    fc = tf.nn.dropout(fc, keep_rate)
    output = tf.matmul(fc, w_out) + out'''

    output = tf.matmul(fc_2, w_out) + out
    #output = tf.nn.l2_normalize(output, 0)

    return output
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2017-12-12 08:17:18

反向传播使用梯度下降来确定在模型中改变权重的程度。这些变化乘以学习率。当学习率很大时,权重的变化也很大;如果学习率太大,最终可能会远离原来的位置,成本函数可能会大大恶化。你的学习速度也可能太小,在这种情况下,你的模型需要很长时间才能改进。

一般来说,如果你的成本正在膨胀,你的学习率可能太高了。

票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/47714193

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档