首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >使用imblearn绘制ROC曲线

使用imblearn绘制ROC曲线
EN

Stack Overflow用户
提问于 2018-07-18 22:26:53
回答 1查看 294关注 0票数 2

我正在尝试使用imblearn来绘制ROC曲线,但遇到了一些问题。

这是我的数据截图

代码语言:javascript
复制
from imblearn.over_sampling import SMOTE, ADASYN
from collections import Counter
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from itertools import cycle
import sys
from sklearn import svm, datasets
from sklearn.metrics import roc_curve, auc
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import label_binarize
from sklearn.multiclass import OneVsRestClassifier
from scipy import interp
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import MultinomialNB
from sklearn.tree import DecisionTreeClassifier
# Import some data to play with
df = pd.read_csv("E:\\autodesk\\Hourly and weather ml.csv")
# X and y are different columns of the input data. Input X as numpy array
X = df[['TTI','Max TemperatureF','Mean TemperatureF','Min TemperatureF',' Min Humidity']].values
# # Reshape X. Do this if X has only one value per data point. In this case, TTI.

# # Input y as normal list
y = df['TTI_Category'].as_matrix()

X_resampled, y_resampled = SMOTE().fit_sample(X, y)

y_resampled = label_binarize(y_resampled, classes=['Good','Bad','Ok'])
n_classes = y_resampled.shape[1]

# shuffle and split training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.5,
                                                    random_state=0)

# Learn to predict each class against the other
classifier = OneVsRestClassifier(DecisionTreeClassifier(random_state=0))
y_score=classifier.fit(X_resampled, y_resampled).predict_proba(X_test)

# Compute ROC curve and ROC area for each class

fpr = dict()
tpr = dict()

roc_auc = dict()
for i in range(n_classes):
    fpr[i], tpr[i], _ = roc_curve(y_test[:, i], y_score[:, i])
    roc_auc[i] = auc(fpr[i], tpr[i])

# Compute micro-average ROC curve and ROC area
fpr["micro"], tpr["micro"], _ = roc_curve(y_test.ravel(), y_score.ravel())

roc_auc["micro"] = auc(fpr["micro"], tpr["micro"])

plt.figure()

我将原始X_train and y_train更改为X_resampled, y_resampled,因为训练应该在重新采样的数据集上进行,而测试需要在原始测试数据集上进行。然而,我得到了下面的回溯

代码语言:javascript
复制
runfile('E:/autodesk/SMOTE with multiclass.py', wdir='E:/autodesk')
Traceback (most recent call last):

  File "<ipython-input-128-efb16ffc92ca>", line 1, in <module>
    runfile('E:/autodesk/SMOTE with multiclass.py', wdir='E:/autodesk')

  File "C:\Users\Think\Anaconda2\lib\site-packages\spyder\utils\site\sitecustomize.py", line 880, in runfile
    execfile(filename, namespace)

  File "C:\Users\Think\Anaconda2\lib\site-packages\spyder\utils\site\sitecustomize.py", line 87, in execfile
    exec(compile(scripttext, filename, 'exec'), glob, loc)

  File "E:/autodesk/SMOTE with multiclass.py", line 51, in <module>
    fpr[i], tpr[i], _ = roc_curve(y_test[:, i], y_score[:, i])

IndexError: too many indices for array

我添加了另一行代码来对y_resampled和原始y进行二进制化,其他内容保持不变,但我不确定是否拟合了重新采样的数据并测试了原始数据

代码语言:javascript
复制
X_resampled, y_resampled = SMOTE().fit_sample(X, y)

y_resampled = label_binarize(y_resampled, classes=['Good','Bad','Ok'])

y = label_binarize(y, classes=['Good','Bad','Ok'])
n_classes = y.shape[1]

非常感谢你的帮助。

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2018-07-20 20:40:44

首先,让我们讨论一下错误。您正在执行以下操作:

代码语言:javascript
复制
y_resampled = label_binarize(y_resampled, classes=['Good','Bad','Ok'])
n_classes = y_resampled.shape[1]

所以你的n_classes实际上是3。

在接下来的部分中,您完成了以下操作:

代码语言:javascript
复制
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.5,
                                                random_state=0)

这里您使用的是原始的y,而不是y_resampled。因此,y_test当前是形状(n_samples,)的一维阵列,或者可能是形状(n_samples, 1)的列向量。

在for循环中,您开始从0迭代到3 (n_classes),这对于y_test是不可能的,因此出现错误,即您试图在y_test中访问的索引不存在。

其次,您应该首先将数据拆分为训练和测试,然后只对训练部分进行重新采样。

所以这段代码应该能做你想做的事:

代码语言:javascript
复制
# First divide the data into train test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.5,
                                                    random_state=0)

# Then only resample the training data
X_resampled, y_resampled = SMOTE().fit_sample(X_train, y_train)

# Then label binarize them to be used in multi-class roc
y_resampled = label_binarize(y_resampled, classes=['Good','Bad','Ok'])

# Do this to the test data too
y_test = label_binarize(y_test, classes=['Good','Bad','Ok'])

y_score=classifier.fit(X_resampled, y_resampled).predict_proba(X_test)

# Then you can do this and other parts of code
for i in range(n_classes):
    fpr[i], tpr[i], _ = roc_curve(y_test[:, i], y_score[:, i])
    roc_auc[i] = auc(fpr[i], tpr[i])
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/51404590

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档