问如何实现基于动量的随机梯度下降(SGD)
EN

Stack Overflow用户

提问于 2016-10-04 22:40:32

回答 1查看 6.6K关注 0票数 4

我使用python代码network3.py (http://neuralnetworksanddeeplearning.com/chap6.html)来开发卷积神经网络。现在我想稍微修改一下代码，添加一个动量学习规则，如下所示：

velocity = momentum_constant * velocity - learning_rate * gradient
params = params + velocity

有没有人知道怎么做？特别是，如何设置或初始化速度？我把SGD的代码贴在下面：

def __init__(self, layers, mini_batch_size):
    """Takes a list of `layers`, describing the network architecture, and
    a value for the `mini_batch_size` to be used during training
    by stochastic gradient descent.

    """
    self.layers = layers
    self.mini_batch_size = mini_batch_size
    self.params = [param for layer in self.layers for param in layer.params]
    self.x = T.matrix("x")
    self.y = T.ivector("y")
    init_layer = self.layers[0]
    init_layer.set_inpt(self.x, self.x, self.mini_batch_size)
    for j in xrange(1, len(self.layers)):
        prev_layer, layer  = self.layers[j-1], self.layers[j]
        layer.set_inpt(
            prev_layer.output, prev_layer.output_dropout, self.mini_batch_size)
    self.output = self.layers[-1].output
    self.output_dropout = self.layers[-1].output_dropout


def SGD(self, training_data, epochs, mini_batch_size, eta,
        validation_data, test_data, lmbda=0.0):
    """Train the network using mini-batch stochastic gradient descent."""
    training_x, training_y = training_data
    validation_x, validation_y = validation_data
    test_x, test_y = test_data

    # compute number of minibatches for training, validation and testing
    num_training_batches = size(training_data)/mini_batch_size
    num_validation_batches = size(validation_data)/mini_batch_size
    num_test_batches = size(test_data)/mini_batch_size

    # define the (regularized) cost function, symbolic gradients, and updates
    l2_norm_squared = sum([(layer.w**2).sum() for layer in self.layers])
    cost = self.layers[-1].cost(self)+\
           0.5*lmbda*l2_norm_squared/num_training_batches
    grads = T.grad(cost, self.params)
    updates = [(param, param-eta*grad)
               for param, grad in zip(self.params, grads)]

    # define functions to train a mini-batch, and to compute the
    # accuracy in validation and test mini-batches.
    i = T.lscalar() # mini-batch index
    train_mb = theano.function(
        [i], cost, updates=updates,
        givens={
            self.x:
            training_x[i*self.mini_batch_size: (i+1)*self.mini_batch_size],
            self.y:
            training_y[i*self.mini_batch_size: (i+1)*self.mini_batch_size]
        })
    validate_mb_accuracy = theano.function(
        [i], self.layers[-1].accuracy(self.y),
        givens={
            self.x:
            validation_x[i*self.mini_batch_size: (i+1)*self.mini_batch_size],
            self.y:
            validation_y[i*self.mini_batch_size: (i+1)*self.mini_batch_size]
        })
    test_mb_accuracy = theano.function(
        [i], self.layers[-1].accuracy(self.y),
        givens={
            self.x:
            test_x[i*self.mini_batch_size: (i+1)*self.mini_batch_size],
            self.y:
            test_y[i*self.mini_batch_size: (i+1)*self.mini_batch_size]
        })
    self.test_mb_predictions = theano.function(
        [i], self.layers[-1].y_out,
        givens={
            self.x:
            test_x[i*self.mini_batch_size: (i+1)*self.mini_batch_size]
        })
    # Do the actual training
    best_validation_accuracy = 0.0
    for epoch in xrange(epochs):
        for minibatch_index in xrange(num_training_batches):
            iteration = num_training_batches*epoch+minibatch_index
            if iteration % 1000 == 0:
                print("Training mini-batch number {0}".format(iteration))
            cost_ij = train_mb(minibatch_index)
            if (iteration+1) % num_training_batches == 0:
                validation_accuracy = np.mean(
                    [validate_mb_accuracy(j) for j in xrange(num_validation_batches)])
                print("Epoch {0}: validation accuracy {1:.2%}".format(
                    epoch, validation_accuracy))
                if validation_accuracy >= best_validation_accuracy:
                    print("This is the best validation accuracy to date.")
                    best_validation_accuracy = validation_accuracy
                    best_iteration = iteration
                    if test_data:
                        test_accuracy = np.mean(
                            [test_mb_accuracy(j) for j in xrange(num_test_batches)])
                        print('The corresponding test accuracy is {0:.2%}'.format(
                            test_accuracy))

python

momentum

回答 1

Stack Overflow用户

发布于 2016-10-04 23:15:15

我只是从头开始编写SDG (不是使用theano)，但从您的代码判断，您需要

1)用一串零(每个梯度一个)来初始化velocities，

2)在你的更新中包含速度；例如

updates = [(param, param-eta*grad +momentum_constant*vel)
           for param, grad, vel in zip(self.params, grads, velocities)]

3)修改训练函数以返回每次迭代的梯度，以便更新velocities。

票数 6

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/39855184

复制

相似问题

问如何实现基于动量的随机梯度下降(SGD)
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何实现基于动量的随机梯度下降(SGD)EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何实现基于动量的随机梯度下降(SGD)
EN