我试图计算分类器的ndcg分数,但我得到了以下错误:
ValueError:只支持(“多标签-指示器”、“连续-多输出”、“多类-多输出”)格式。得到多类
这是我的密码:
# Declare classifier, fit on data and make predictions
from sklearn.ensemble import RandomForestClassifier
rnd_forest = RandomForestClassifier()
rnd_forest.fit(X_train_tr, y_train)
y_pred_prob = rnd_forest.predict_proba(X_train_tr)
# Calculate ndcg score
from sklearn.metrics import ndcg_score
# This is where I get an error
ndcg_score(y_train, y_pred_prob, k=5)这就是我的目标和预测的概率:
# True labels of the first two samples
y_train[:2]
> array([7, 7])
# Predicted probabilities for first two observation
y_pred_prob[:2]
> array([[0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.]])我试图将y_train重塑成一个二维数组,但它不起作用。有人能告诉我如何纠正这个错误吗?
发布于 2021-03-12 19:00:38
假设在N中有y_train观测。您必须将y_train转换为由N行和12列组成的矩阵。
# Create an ndarray of size (N, 12) filled with zeros
y_train_matrix = np.zeros(shape=(y_pred_prob.shape[0], y_pred_prob.shape[1]))
# Write a 1 on each row's corresponding category
y_train_matrix[np.arange(y_pred_prob.shape[0]), y_train] = 1
# You now have this ndarray
y_train_matrix
array([[0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.]])现在你可以计算分数了:
ndcg_score(y_train_matrix, y_pred_prob)
1.0https://stackoverflow.com/questions/66600401
复制相似问题