文章/答案/技术大牛

发布

社区首页 >问答首页 >CS231n:如何计算Softmax损失函数的梯度？

问CS231n:如何计算Softmax损失函数的梯度？
EN

Stack Overflow用户

提问于 2017-01-16 01:08:31

回答 3查看 33.8K关注 0票数 30

我正在看一些斯坦福CS231的视频:用于视觉识别的卷积神经网络，但不太了解如何使用numpy计算softmax损失函数的解析梯度。

根据this stackexchange answer，softmax梯度的计算公式为：

上面的Python实现是：

num_classes = W.shape[0]
num_train = X.shape[1]
for i in range(num_train):
  for j in range(num_classes):
    p = np.exp(f_i[j])/sum_i
    dW[j, :] += (p-(j == y[i])) * X[:, i]

有人能解释一下上面的代码片段是如何工作的吗？softmax的详细实现也包括在下面。

def softmax_loss_naive(W, X, y, reg):
  """
  Softmax loss function, naive implementation (with loops)
  Inputs:
  - W: C x D array of weights
  - X: D x N array of data. Data are D-dimensional columns
  - y: 1-dimensional array of length N with labels 0...K-1, for K classes
  - reg: (float) regularization strength
  Returns:
  a tuple of:
  - loss as single float
  - gradient with respect to weights W, an array of same size as W
  """
  # Initialize the loss and gradient to zero.
  loss = 0.0
  dW = np.zeros_like(W)

  #############################################################################
  # Compute the softmax loss and its gradient using explicit loops.           #
  # Store the loss in loss and the gradient in dW. If you are not careful     #
  # here, it is easy to run into numeric instability. Don't forget the        #
  # regularization!                                                           #
  #############################################################################

  # Get shapes
  num_classes = W.shape[0]
  num_train = X.shape[1]

  for i in range(num_train):
    # Compute vector of scores
    f_i = W.dot(X[:, i]) # in R^{num_classes}

    # Normalization trick to avoid numerical instability, per http://cs231n.github.io/linear-classify/#softmax
    log_c = np.max(f_i)
    f_i -= log_c

    # Compute loss (and add to it, divided later)
    # L_i = - f(x_i)_{y_i} + log \sum_j e^{f(x_i)_j}
    sum_i = 0.0
    for f_i_j in f_i:
      sum_i += np.exp(f_i_j)
    loss += -f_i[y[i]] + np.log(sum_i)

    # Compute gradient
    # dw_j = 1/num_train * \sum_i[x_i * (p(y_i = j)-Ind{y_i = j} )]
    # Here we are computing the contribution to the inner sum for a given i.
    for j in range(num_classes):
      p = np.exp(f_i[j])/sum_i
      dW[j, :] += (p-(j == y[i])) * X[:, i]

  # Compute average
  loss /= num_train
  dW /= num_train

  # Regularization
  loss += 0.5 * reg * np.sum(W * W)
  dW += reg*W

  return loss, dW

python

numpy

softmax

回答 3

Stack Overflow用户

回答已采纳

发布于 2017-01-19 15:05:02

不确定这是否有帮助，但是：

是真正的指示器函数

，如here所述。这在代码中形成了表达式(j == y[i])。

另外，损失相对于权重的梯度是：

哪里

它是代码中X[:,i]的来源。

票数 19

Stack Overflow用户

发布于 2018-12-30 03:38:27

我知道现在有点晚了，但这是我的答案：

我假设你熟悉cs231n Softmax损失函数。我们知道：

因此，就像我们对SVM损失函数所做的那样，梯度如下：

希望这能帮上忙。

票数 11

Stack Overflow用户

发布于 2021-04-11 23:37:39

一个带有小示例的supplement to this answer。

票数 4

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/41663874

复制

相似问题

问CS231n:如何计算Softmax损失函数的梯度？
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问CS231n:如何计算Softmax损失函数的梯度？EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问CS231n:如何计算Softmax损失函数的梯度？
EN