Sklearn's documentation似乎暗示neg_log_loss评分使用log_loss作为得分者。This问题试图澄清在引擎盖下发生了什么,被接受的答案是neg_log_loss简单地等于- log_loss。然而,所附的示例表明情况并非如此。
scoring = "neg_log_loss"和scoring=make_scorer(log_loss)之间的关系是什么?明显的不连续性让我认为neg_log_loss在损失中使用的是概率而不是预测。如何修改下面的代码,使每个方法返回相同的结果?
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, log_loss, make_scorer, get_scorer
from sklearn.model_selection import GridSearchCV
from sklearn.datasets import make_classification
from sklearn.model_selection import KFold
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
X,y = make_classification(random_state = 0)
cv = KFold(10)
nll = lambda y, ypred: -1*log_loss(y, ypred)
param_grid = {'C':1/np.logspace(-5,2, base = np.exp(1))}
model = LogisticRegression(penalty = 'l1', solver = 'liblinear', max_iter=10_000)
gscv_scoring = GridSearchCV(model, param_grid=param_grid, cv = cv, scoring = 'neg_log_loss').fit(X,y)
gscv_make_scoring = GridSearchCV(model, param_grid=param_grid, cv = cv, scoring = make_scorer(nll)).fit(X,y)
fig, ax = plt.subplots(dpi = 120)
r1 = pd.DataFrame(gscv_scoring.cv_results_)
r2 = pd.DataFrame(gscv_make_scoring.cv_results_)
plt.plot(r1.param_C, r1.mean_test_score)
plt.plot(r2.param_C, r2.mean_test_score)

发布于 2021-07-11 23:58:41
如果您使用:
cross_val_score(model, X_train, y_train, scoring='neg_log_loss', cv=2)你会得到负的概率。如果直接通过make_scorer传入metrics.log_loss,则需要设置greater_is_better=False, needs_proba=True
要通过直接使用metrics.log_loss获得等效项,您需要从模型中传递predict_proba的结果,而不是预测的标签。
y_probs = model.predict_proba(X_train)
log_loss(y_train, y_probs)这将带给你正的概率。然后,就像你提到的,它们只有一个符号的不同。
https://stackoverflow.com/questions/61471034
复制相似问题