文章/答案/技术大牛

发布

社区首页 >问答首页 >LeaveOneOut用于确定k近邻的k

问LeaveOneOut用于确定k近邻的k
EN

Stack Overflow用户

提问于 2018-12-17 01:43:45

回答 1查看 302关注 0票数 0

我想知道k近邻的最佳k。我正在使用LeaveOneOut将我的数据划分为训练集和测试集。在下面的代码中，我有150个数据条目，所以我得到了150个不同的训练和测试集。K应该在-1到40之间。

我想绘制交叉验证平均分类误差作为k的函数，也看看哪个k是KNN的最佳。

下面是我的代码：

import scipy.io as sio
import seaborn as sn
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import LeaveOneOut    
error = []
array = np.array(range(1,41))

dataset = pd.read_excel('Data/iris.xls')
X = dataset.iloc[:, :-1].values  
y = dataset.iloc[:, 4].values

loo = LeaveOneOut()
loo.get_n_splits(X)
for train_index, test_index in loo.split(X):
    #print("TRAIN:", train_index, "TEST:", test_index)
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]
    #print(X_train, X_test, y_train, y_test)

    for i in range(1, 41):  
        classifier = KNeighborsClassifier(n_neighbors=i)  
        classifier.fit(X_train, y_train)
        y_pred = classifier.predict(X_test)
        error.append(np.mean(y_pred != y_test))

plt.figure(figsize=(12, 6))  
plt.plot(range(1, 41), error, color='red', linestyle='dashed', marker='o', markerfacecolor='blue', markersize=10)
plt.title('Error Rate K Value')  
plt.xlabel('K Value')  
plt.ylabel('Mean Error')

python

machine-learning

scikit-learn

cross-validation

knn

回答 1

Stack Overflow用户

回答已采纳

发布于 2018-12-17 15:39:05

你在每次预测时都会计算误差，这就是为什么你的error数组中有6000个点。您需要收集给定'n_neighbors'的文件夹中所有点的预测，然后计算该值的误差。

您可以这样做：

# Loop over possible values of "n_neighbors"
for i in range(1, 41):  

    # Collect the actual and predicted values for all splits for a single "n_neighbors"
    actual = []
    predicted = []


    for train_index, test_index in loo.split(X):
        #print("TRAIN:", train_index, "TEST:", test_index)
        X_train, X_test = X[train_index], X[test_index]
        y_train, y_test = y[train_index], y[test_index]

        classifier = KNeighborsClassifier(n_neighbors=i)  
        classifier.fit(X_train, y_train)
        y_pred = classifier.predict(X_test)

        # Append the single predictions and actual values here.
        actual.append(y_test[0])
        predicted.append(y_pred[0])

    # Outside the loop, calculate the error.
    error.append(np.mean(np.array(predicted) != np.array(actual)))

剩下的代码就没问题了。

如果使用cross_val_predict，则有一种更简洁的方法来实现这一点

from sklearn.model_selection import cross_val_predict

for i in range(1, 41):  

    classifier = KNeighborsClassifier(n_neighbors=i)  
    y_pred = cross_val_predict(classifier, X, y, cv=loo)
    error.append(np.mean(y_pred != y))

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/53804854

复制

相似问题

问LeaveOneOut用于确定k近邻的k
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问LeaveOneOut用于确定k近邻的kEN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问LeaveOneOut用于确定k近邻的k
EN