文章/答案/技术大牛

发布

社区首页 >问答首页 >Sklearn.metrics.classification_report混淆矩阵问题？

问Sklearn.metrics.classification_report混淆矩阵问题？
EN

Stack Overflow用户

提问于 2020-04-01 08:56:47

回答 2查看 1.3K关注 0票数 1

首先，感谢您阅读我的问题-我希望这是正确的地方。

我从头开始对一个混淆矩阵进行敏感性、特异性和精确度计算。我有以下4个类的混淆矩阵。

                    True Class
                   1   2   3   4

           1   [[  0   1   3   0]
Predicted  2    [  0 181  23   0]
Class      3    [  0  17  53  14]
           4    [  0   3  22  77]]

当我使用Sklearn.metrics.classification_report时，我得到的是：

precision  recall  f1-score   support

  0.00      0.00      0.00         4
  0.89      0.89      0.89       204
  0.52      0.63      0.57        84
  0.85      0.75      0.80       102

然而，对于精确度和召回率，我得到了(即精确度和召回率的值是翻转的)：

precision  recall
  0.0       nan
  0.887     0.896
  0.631     0.524
  0.755     0.846

对于每个类，我计算了以下真阳性、假阳性、真阴性和假阴性：

class Tp  Fp  Tn   Fn
1     0   4   390  0
2     181 23  169  21
3     53  31  262  48
4     77  25  278  14

我使用的公式(https://en.wikipedia.org/wiki/Confusion_matrix)是：

sensitivity/recall = true_positives / (true_positives + false_negatives)

precision = true_positives/(true_positives+false_positives)

我哪里错了，当然sklearn的分类问题不可能是问题，我是不是读错了什么？

Edit:我的函数，用于在给定sklearn.metrics.confusion_matrix中的混淆矩阵和类号列表的情况下计算精度和回调值，例如对于类1-3: 1、2、3类。

def calc_precision_recall(conf_matrix, class_labels):

    # for each class 
    for i in range(len(class_labels)):

        # calculate true positives
        true_positives =(conf_matrix[i, i])

        # false positives
        false_positives = (conf_matrix[i, :].sum() - true_positives)

        # false negatives
        false_negatives = 0
        for j in range(len(class_labels)):
            false_negatives += conf_matrix[j, i]
        false_negatives -= true_positives

        # and finally true negatives
        true_negatives = (conf_matrix.sum() - false_positives - false_negatives - true_positives)

        # print calculated values
        print(
            "Class label", class_labels[i],
            "T_positive", true_positives,
            "F_positive", false_positives,
            "T_negative", true_negatives,
            "F_negative", false_negatives,
            "\nSensitivity/recall", true_positives / (true_positives + false_negatives),
            "Specificity", true_negatives / (true_negatives + false_positives),
            "Precision", true_positives/(true_positives+false_positives), "\n"
        )

    return

confusion-matrix

python

machine-learning

scikit-learn

回答 2

Stack Overflow用户

发布于 2020-04-01 10:59:34

好的，你的代码在哪里？当没有人能看到你的代码时，这是不可能肯定的。我来试试here...maybe你的数据是不平衡的。在某些功能栏中是否有更多/更少的记录？以一致的方式对数组或稀疏矩阵重新采样。

这对你来说应该没问题，对吧。测试一下，看看。

# Begin by importing all necessary libraries
import pandas as pd
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn import datasets


# import some data to play with
iris = datasets.load_iris()
X = iris.data[:, 0:3]  # we only take the first two features.
y = iris.target


# Now that we have the features and labels we want, we can split the data into training and testing sets using sklearn's handy feature train_test_split():

# Test size specifies how much of the data you want to set aside for the testing set. 
# Random_state parameter is just a random seed we can use.
# You can use it if you'd like to reproduce these specific results.
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=27)

# You may want to print the results to be sure your data is being parsed as you expect:

print(X_train)  
print(y_train)

# Now we can instantiate the models. Let's try using two classifiers, a Support Vector Classifier and a K-Nearest Neighbors Classifier:

SVC_model = SVC()
# KNN model requires you to specify n_neighbors,
# the number of points the classifier will look at to determine what class a new point belongs to
KNN_model = KNeighborsClassifier(n_neighbors=5)

# Now let's fit the classifiers:

SVC_model.fit(X_train, y_train)
KNN_model.fit(X_train, y_train)

# The call has trained the model, so now we can predict and store the prediction in a variable:

SVC_prediction = SVC_model.predict(X_test)
KNN_prediction = KNN_model.predict(X_test)

#We should now evaluate how the classifier performed. There are multiple methods of evaluating a classifier's performance, and you can read more about there different methods below.

#In Scikit-Learn you just pass in the predictions against the ground truth labels which were stored in your test labels:

# Accuracy score is the simplest way to evaluate
print(accuracy_score(SVC_prediction, y_test))
print(accuracy_score(KNN_prediction, y_test))



# But Confusion Matrix and Classification Report give more details about performance
print(confusion_matrix(SVC_prediction, y_test))
print(classification_report(KNN_prediction, y_test))

结果：

              precision    recall  f1-score   support

           0       1.00      1.00      1.00         7
           1       0.91      0.91      0.91        11
           2       0.92      0.92      0.92        12

    accuracy                           0.93        30
   macro avg       0.94      0.94      0.94        30
weighted avg       0.93      0.93      0.93        30

请参阅下面的资源。

https://www.kaggle.com/rafjaa/resampling-strategies-for-imbalanced-datasets

https://scikit-learn.org/stable/modules/generated/sklearn.utils.resample.html

哦，x和y变量都有150条记录。

X.shape
y.shape

结果：

X.shape
Out[107]: (150, 3)

y.shape
Out[108]: (150,)

票数 0

Stack Overflow用户

发布于 2021-04-03 12:01:10

我比较了我的每个命令的回报和手工制作的回报，他们都同意。我想你错误地考虑了TP，FN，FP，TN的值(或一些值)。查看图表可能会有所帮助：

(图片来自互联网：https://www.stardat.net/post/confusion-matrix)

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/60962106

复制

相似问题

问Sklearn.metrics.classification_report混淆矩阵问题？
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Sklearn.metrics.classification_report混淆矩阵问题？EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Sklearn.metrics.classification_report混淆矩阵问题？
EN