我正在创建一个具有以下维度的三层图像识别神经网络: 400个特征,40个节点,40个节点,10个目标(从数字0到9的图像),因此这些是我的权重(θ):
theta1 = np.random.uniform(low=0.00001, high=0.0001, size=(40,401))
theta2 = np.random.uniform(low=0.00001, high=0.0001, size=(40,41))
theta3 = np.random.uniform(low=0.00001, high=0.0001, size=(10,41))我正在关注吴恩达的approach。我在恢复传播方面遇到了一些麻烦。首先,我通过找出实际结果和预测之间的差异来获得delta_4项。然后,使用以下方程获得剩余的增量项,

其中g‘是sigmoid函数的导数。我编写了以下函数:
def get_delta(nodes_current, theta_current, delta_previous):
derivative = np.multiply(nodes_current, 1-nodes_current)
matmul_term = np.matmul(np.transpose(theta_current), delta_previous)
delta_current = np.multiply(matmul_term, derivative)
return delta_current然后,获得梯度的整个反向传播过程如下:
def backward_prop3(y_vectors, a1, a2, a3, a4, theta1, theta2, theta3):
#y_vectors is a 10 by m (num of training examples) matrix
#a1 is the features
#a2, a3 are the hidden nodes
#a4 is the output
m = y_vectors.shape[1]
delta4 = a4 - y_vectors
delta3 = get_delta(a3, theta3, delta4)
triangle3 = np.matmul(delta4, np.transpose(a3))
delta2 = get_delta(a2, theta2, delta3)
triangle2 = np.matmul(delta3, np.transpose(a2))
triangle1 = np.matmul(delta2[1:,:],np.transpose(a1))
grad3 = (1/m)*triangle1
grad2 = (1/m)*triangle2
grad1 = (1/m)*triangle1
return grad1, grad2, grad3问题出现在获取delta2的代码行中,特别是在get_delta函数行中:
matmul_term = np.matmul(np.transpose(theta_current), delta_previous)错误是说
matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 51 is different from 50)我已经检查了theta矩阵的维数,也复习了课程笔记,但我不明白为什么它不起作用,因为我已经完全按照方程式中所示的方式实现了它。
发布于 2021-08-01 16:50:56
如果你有大小为41的theta,而你有40个导致错误的特征。要对此进行调试,请确保所有矩阵都具有有效的维度,以便它们可以相互相乘。
您的delta2的尺寸为41x41,而您的theta2的尺寸为40x41。因此,transpose(theta2)*delta2 2将尝试乘以41x40 * 41x41。这会导致尺寸不匹配。你的theta2应该有41x...才能让它起作用。
https://stackoverflow.com/questions/68496728
复制相似问题