文章/答案/技术大牛

发布

社区首页 >问答首页 >如何为两个分类器和两个不同的数据集创建组合ROC曲线

问如何为两个分类器和两个不同的数据集创建组合ROC曲线
EN

Stack Overflow用户

提问于 2019-10-30 03:35:06

回答 1查看 231关注 0票数 0

我有一个1127名患者的数据集。我的目标是将每个患者分类为0或1。我有两个不同的分类器，但目的相同-将患者分类为0或1。我对364名患者运行了一个分类器，对763名患者运行了第二个分类器。对于每个分类器\组，我生成了ROC曲线。现在，我想把曲线组合起来。有人能指导我怎么做吗？我在考虑计算加权的FPR和TPR，但我不确定怎么做。不同曲线的FPR\TPR对数不同(第一条ROC曲线基于312对，第二条ROC曲线基于666对)。

谢谢！

machine-learning

roc

回答 1

Stack Overflow用户

发布于 2019-11-14 09:18:06

导入

import numpy as np
from sklearn.metrics import roc_curve, auc
import matplotlib.pyplot as plt

数据生成

# simulate first dataset with 364 obs
df1 = \
pd.DataFrame(i for i in range(364))
df1['predict_proba_1'] = np.random.normal(0,1,len(df1))
df1['epsilon'] = np.random.normal(0,1,len(df1))
df1['true'] = (0.7*df1['epsilon'] < df1['predict_proba_1']) * 1
df1 = df1.drop(columns=[0, 'epsilon'])

# simulate second dataset with 763 obs
df2 = \
pd.DataFrame(i for i in range(763))
df2['predict_proba_2'] = np.random.normal(0,1,len(df2))
df2['epsilon'] = np.random.normal(0,1,len(df2))
df2['true'] = (0.7*df2['epsilon'] < df2['predict_proba_2']) * 1
df2 = df2.drop(columns=[0, 'epsilon'])

快速查看生成的数据

df1
     predict_proba_1  true
0           1.234549     1
1          -0.586544     0
2          -0.229539     1
3           0.132185     1
4          -0.411284     0
..               ...   ...
359        -0.218775     0
360        -0.985565     0
361         0.542790     1
362        -0.463667     0
363         1.119244     1

[364 rows x 2 columns]

df2
     predict_proba_2  true
0           0.278755     1
1           0.653663     0
2          -0.304216     1
3           0.955658     1
4          -1.341669     0
..               ...   ...
758         1.359606     1
759        -0.605894     0
760         0.379738     0
761         1.571615     1
762        -1.102565     0

[763 rows x 2 columns]

必要函数

def show_ROCs(scores_list: list, ys_list: list, labels_list:list = None):
    """
    This function plots a couple of ROCs. Corresponding labels are optional.

    Parameters
    ----------
    scores_list : list of array-likes with scorings or predicted probabilities.
    ys_list : list of array-likes with ground true labels.
    labels_list : list of labels to be displayed in plotted graph.

    Returns
    ----------
    None

    """
    if len(scores_list) != len(ys_list):
        raise Exception('len(scores_list) != len(ys_list)')
    fpr_dict = dict()
    tpr_dict = dict()
    for x in range(len(scores_list)):
        fpr_dict[x], tpr_dict[x], _ = roc_curve(ys_list[x], scores_list[x])
    for x in range(len(scores_list)):
        try:
            plot_ROC(fpr_dict[x], tpr_dict[x], str(labels_list[x]) + ' AUC:' + str(round(auc(fpr_dict[x], tpr_dict[x]),3)))
        except:
            plot_ROC(fpr_dict[x], tpr_dict[x], str(x) + ' ' + str(round(auc(fpr_dict[x], tpr_dict[x]),3)))
    plt.show()

def plot_ROC(fpr, tpr, label):
    """
    This function plots a single ROC. Corresponding label is optional.

    Parameters
    ----------
    fpr : array-likes with fpr.
    tpr : array-likes with tpr.
    label : label to be displayed in plotted graph.

    Returns
    ----------
    None

    """
    plt.figure(1)
    plt.plot([0, 1], [0, 1], 'k--')
    plt.plot(fpr, tpr, label=label)
    plt.xlabel('False positive rate')
    plt.ylabel('True positive rate')
    plt.title('ROC curve')
    plt.legend(loc='best')

Plotting

show_ROCs(
    [df1['predict_proba_1'], df2['predict_proba_2']],
    [df1['true'], df2['true']],
    ['df1 with {} obs'.format(len(df1)), 'df2 with {} obs'.format(len(df2))]
)

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/58614259

复制

相似问题

问如何为两个分类器和两个不同的数据集创建组合ROC曲线
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何为两个分类器和两个不同的数据集创建组合ROC曲线EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何为两个分类器和两个不同的数据集创建组合ROC曲线
EN