首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >DecisionTreeClassifier的查全率曲线下的面积是正方形。

DecisionTreeClassifier的查全率曲线下的面积是正方形。
EN

Stack Overflow用户
提问于 2018-04-03 14:38:23
回答 1查看 2.2K关注 0票数 1

我正在使用DecisionTreeClassifier -学习分类的一些数据。我也在使用其他算法,为了比较它们,我使用了精确召回度量下的区域。问题是,DecisionTreeClassifier的形状是正方形,而不是您所期望的该度量的通常形状。

以下是我如何计算DecisionTreeClassifier的AUPRC。我在计算这方面有一些困难,因为DecisionTreeClassifer和其他分类器(如LogisticRegression )一样,没有decision_function()

这些都是我在支持向量机、Logistic回归和DecisionTreeClassifier中得到的结果。

下面是我如何计算DecisionTreeClassifier的AUPRC

代码语言:javascript
复制
def execute(X_train, y_train, X_test, y_test):
    tree = DecisionTreeClassifier(class_weight='balanced')
    tree_y_score = tree.fit(X_train, y_train).predict(X_test)

    tree_ap_score = average_precision_score(y_test, tree_y_score)

    precision, recall, _ = precision_recall_curve(y_test, tree_y_score)
    values = {'ap_score': tree_ap_score, 'precision': precision, 'recall': recall}
    return values

下面是我如何计算支持向量机的AUPRC:

代码语言:javascript
复制
def execute(X_train, y_train, X_test, y_test):
    svm = SVC(class_weight='balanced')
    svm.fit(X_train, y_train.values.ravel())
    svm_y_score = svm.decision_function(X_test)

    svm_ap_score = average_precision_score(y_test, svm_y_score)

    precision, recall, _ = precision_recall_curve(y_test, svm_y_score)
    values = {'ap_score': svm_ap_score, 'precision': precision, 'recall': recall}
    return values

下面是我如何计算LogisticRegression的AUPRC:

代码语言:javascript
复制
def execute(X_train, y_train, X_test, y_test):
    lr = LogisticRegression(class_weight='balanced')
    lr.fit(X_train, y_train.values.ravel())
    lr_y_score = lr.decision_function(X_test)

    lr_ap_score = average_precision_score(y_test, lr_y_score)

    precision, recall, _ = precision_recall_curve(y_test, lr_y_score)
    values = {'ap_score': lr_ap_score, 'precision': precision, 'recall': recall}
    return values

然后,我将它们称为方法,并绘制如下结果:

代码语言:javascript
复制
import LogReg_AP_Harness as lrApTest
import SVM_AP_Harness as svmApTest
import DecTree_AP_Harness as dtApTest
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import label_binarize
import matplotlib.pyplot as plt


def do_work(df):
    X = df.ix[:, df.columns != 'Class']
    y = df.ix[:, df.columns == 'Class']

    y_binarized = label_binarize(y, classes=[0, 1])
    n_classes = y_binarized.shape[1]

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.3, random_state=0)

    _, _, y_train_binarized, y_test_binarized = train_test_split(X, y_binarized, test_size=.3, random_state=0)

    print('Executing Logistic Regression')
    lr_values = lrApTest.execute(X_train, y_train, X_test, y_test)
    print('Executing Decision Tree')
    dt_values = dtApTest.execute(X_train, y_train_binarized, X_test, y_test_binarized)
    print('Executing SVM')
    svm_values = svmApTest.execute(X_train, y_train, X_test, y_test)

    plot_aupr_curves(lr_values, svm_values, dt_values)


def plot_aupr_curves(lr_values, svm_values, dt_values):
    lr_ap_score = lr_values['ap_score']
    lr_precision = lr_values['precision']
    lr_recall = lr_values['recall']

    svm_ap_score = svm_values['ap_score']
    svm_precision = svm_values['precision']
    svm_recall = svm_values['recall']

    dt_ap_score = dt_values['ap_score']
    dt_precision = dt_values['precision']
    dt_recall = dt_values['recall']

    plt.step(svm_recall, svm_precision, color='g', alpha=0.2,where='post')
    plt.fill_between(svm_recall, svm_precision, step='post', alpha=0.2, color='g')

    plt.step(lr_recall, lr_precision, color='b', alpha=0.2, where='post')
    plt.fill_between(lr_recall, lr_precision, step='post', alpha=0.2, color='b')

    plt.step(dt_recall, dt_precision, color='r', alpha=0.2, where='post')
    plt.fill_between(dt_recall, dt_precision, step='post', alpha=0.2, color='r')

    plt.xlabel('Recall')
    plt.ylabel('Precision')
    plt.ylim([0.0, 1.05])
    plt.xlim([0.0, 1.0])
    plt.title('SVM (Green): Precision-Recall curve: AP={0:0.2f}'.format(svm_ap_score) + '\n' +
              'Logistic Regression (Blue): Precision-Recall curve: AP={0:0.2f}'.format(lr_ap_score) + '\n' +
              'Decision Tree (Red): Precision-Recall curve: AP={0:0.2f}'.format(dt_ap_score))
    plt.show()

do_work()方法中,我不得不对y进行二进制化,因为DecisionTreeClassifier没有descision_function()。我从here那里得到了这个方法。

这就是情节:

我想归结起来是我错误地计算了DecisionTreeClassifier的AUPRC。

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2018-04-03 16:57:07

对于DecisionTreeClassifier,将predict替换为pred_probapred_probadecision_function具有相同的角色。

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/49632828

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档