首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Keras ROC不同于Scikit ROC?

Keras ROC不同于Scikit ROC?
EN

Stack Overflow用户
提问于 2020-04-15 23:52:03
回答 2查看 368关注 0票数 2

从下面的代码看,使用keras和scikit评估roc实际上是有区别的。有谁知道怎么解释吗?

代码语言:javascript
复制
import tensorflow as tf
from keras.layers import Dense, Input, Dropout
from keras import Sequential
import keras
from keras.constraints import maxnorm
from sklearn.metrics import roc_auc_score

# training data: X_train, y_train
# validation data: X_valid, y_valid

# Define the custom callback we will be using to evaluate roc with scikit
class MyCustomCallback(tf.keras.callbacks.Callback):

    def on_epoch_end(self,epoch, logs=None):
        y_pred = model.predict(X_valid)
        print("roc evaluated with scikit = ",roc_auc_score(y_valid, y_pred))
        return

# Define the model.

def model(): 

    METRICS = [ 
          tf.keras.metrics.BinaryAccuracy(name='accuracy'),
          tf.keras.metrics.AUC(name='auc'),
    ]

    optimizer="adam"
    dropout=0.1
    init='uniform'
    nbr_features= vocab_size-1 #2500
    dense_nparams=256

    model = Sequential()
    model.add(Dense(dense_nparams, activation='relu', input_shape=(nbr_features,), kernel_initializer=init,  kernel_constraint=maxnorm(3)))
    model.add(Dropout(dropout))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(loss='binary_crossentropy', optimizer=optimizer,metrics = METRICS)
    return model

# instantiate the model
model = model()

# fit the model
history = model.fit(x=X_train, y=y_train, batch_size = 8, epochs = 8, verbose=1,validation_data = (X_valid,y_valid), callbacks=[MyCustomCallback()], shuffle=True, validation_freq=1, max_queue_size=10, workers=4, use_multiprocessing=True)

输出:

代码语言:javascript
复制
Train on 4000 samples, validate on 1000 samples
Epoch 1/8
4000/4000 [==============================] - 15s 4ms/step - loss: 0.7950 - accuracy: 0.7149 - auc: 0.7213 - val_loss: 0.7551 - val_accuracy: 0.7608 - val_auc: 0.7770
roc evaluated with scikit =  0.78766515781747
Epoch 2/8
4000/4000 [==============================] - 15s 4ms/step - loss: 0.0771 - accuracy: 0.8235 - auc: 0.8571 - val_loss: 1.0803 - val_accuracy: 0.8574 - val_auc: 0.8954
roc evaluated with scikit =  0.7795984218252997
Epoch 3/8
4000/4000 [==============================] - 14s 4ms/step - loss: 0.0085 - accuracy: 0.8762 - auc: 0.9162 - val_loss: 1.2084 - val_accuracy: 0.8894 - val_auc: 0.9284
roc evaluated with scikit =  0.7705172905961992
Epoch 4/8
4000/4000 [==============================] - 14s 4ms/step - loss: 0.0025 - accuracy: 0.8982 - auc: 0.9361 - val_loss: 1.1700 - val_accuracy: 0.9054 - val_auc: 0.9424
roc evaluated with scikit =  0.7808804338960933
Epoch 5/8
4000/4000 [==============================] - 14s 4ms/step - loss: 0.0020 - accuracy: 0.9107 - auc: 0.9469 - val_loss: 1.1887 - val_accuracy: 0.9150 - val_auc: 0.9501
roc evaluated with scikit =  0.7811174659489438
Epoch 6/8
4000/4000 [==============================] - 14s 4ms/step - loss: 0.0018 - accuracy: 0.9184 - auc: 0.9529 - val_loss: 1.2036 - val_accuracy: 0.9213 - val_auc: 0.9548
roc evaluated with scikit =  0.7822898825544409
Epoch 7/8
4000/4000 [==============================] - 14s 4ms/step - loss: 0.0017 - accuracy: 0.9238 - auc: 0.9566 - val_loss: 1.2231 - val_accuracy: 0.9258 - val_auc: 0.9579
roc evaluated with scikit =  0.7817036742516923
Epoch 8/8
4000/4000 [==============================] - 14s 4ms/step - loss: 0.0016 - accuracy: 0.9278 - auc: 0.9592 - val_loss: 1.2426 - val_accuracy: 0.9293 - val_auc: 0.9600
roc evaluated with scikit =  0.7817419052279585

正如你可能看到的,从第二个时代开始,keras和scikit的验证ROC开始出现分歧。如果我拟合模型,然后使用keras的model.evaluate(X_valid, y_valid),也会发生同样的情况。任何帮助都是非常感谢的。

编辑:在单独的测试集上测试模型,我得到roc =0.76,因此X_valid似乎给出了正确的答案(顺便说一句,X_train有4,000个条目,X_valid有1,000个条目,test有15000个条目,这是一个非常非常规的拆分,但它是由外部因素迫使的)。

此外,关于如何提高性能的建议也同样受到赞赏。

EDIT2:为了回答@arpitrathi的回复,我修改了callbak,但不幸的是没有成功:

代码语言:javascript
复制
class MyCustomCallback(tf.keras.callbacks.Callback):

    def on_epoch_end(self,epoch, logs=None):
        y_pred = model.predict_proba(X_valid)
        print("roc evaluated with scikit = ",roc_auc_score(y_valid, y_pred))
        return

model = model()

history = model.fit(x=X_trainl, y=y_train, batch_size = 8, epochs = 3, verbose=1,validation_data = (X_valid,y_valid), callbacks=[MyCustomCallback()], shuffle=True, validation_freq=1, max_queue_size=10, workers=4, use_multiprocessing=True)


Train on 4000 samples, validate on 1000 samples
Epoch 1/3
4000/4000 [==============================] - 20s 5ms/step - loss: 0.8266 - accuracy: 0.7261 - auc: 0.7409 - val_loss: 0.7547 - val_accuracy: 0.7627 - val_auc: 0.7881
roc evaluated with scikit =  0.7921764130168828
Epoch 2/3
4000/4000 [==============================] - 15s 4ms/step - loss: 0.0482 - accuracy: 0.8270 - auc: 0.8657 - val_loss: 1.0831 - val_accuracy: 0.8620 - val_auc: 0.9054
roc evaluated with scikit =  0.78525915504445
Epoch 3/3
4000/4000 [==============================] - 15s 4ms/step - loss: 0.0092 - accuracy: 0.8794 - auc: 0.9224 - val_loss: 1.2226 - val_accuracy: 0.8928 - val_auc: 0.9340
roc evaluated with scikit =  0.7705555215724655

此外,如果我绘制训练和验证精度图,我看到它们都很快收敛到1。这是不是很奇怪?

EN

回答 2

Stack Overflow用户

发布于 2020-04-16 01:04:59

问题出在您传递给用于roc_auc_score()计算的sklearn函数的参数中。您应该使用model.predict_proba()而不是model.predict()

代码语言:javascript
复制
def on_epoch_end(self,epoch, logs=None):
        y_pred = model.predict_proba(X_valid)
        print("roc evaluated with scikit = ",roc_auc_score(y_valid, y_pred))
        return
票数 2
EN

Stack Overflow用户

发布于 2021-11-09 19:52:42

Sklearn和keras在计算AUC时使用不同的默认参数。增加keras用于计算AUC的阈值数量(即增加num_thresholds)可以帮助keras AUC更好地匹配sklearn AUC。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/61233047

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档