我正试图从头开始执行KNN算法,但是我得到了一个非常奇怪的错误:"KeyError: 0“。
我想这意味着我在某个地方有一本空字典,但我不明白这怎么可能。为了清楚起见,我可能会在黑匣子KNN算法中添加数据工作得很好,所以它肯定必须在代码中.
这是我的密码:
import numpy as np
import pandas as pd
import csv
import scipy.stats as stats
import math
from collections import Counter
import operator
from operator import itemgetter
"""Training features dataset"""
filenametrain_data = 'training_data.csv'
training_feature_set = pd.read_csv(filenametrain_data, header=None, usecols=range(1,13627))
"""Training labels dataset"""
filenametrain_label = 'training_labels.csv'
training_feature_label = pd.read_csv(filenametrain_label, header=None, usecols=[1], names=['Category'])
"""Split into training and testing datasets 90%/10%"""
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(training_feature_set, training_feature_label, test_size = 0.1, random_state=42)
"""KNN Model"""
def distance(X_train, y_train):
dist = 0.0
for i in range(len(X_train)):
dist += pow((X_train[i] - y_train[i]), 2)
return math.sqrt(dist)
def getNeighbors(X_train, y_train, X_test, k):
distances = []
for i in range(len(X_train)):
dist = distance(X_test, X_train[i])
distances.append((X_train[i], dist, y_train[i]))
distances.sort(key=operator.itemgetter(1))
neighbor = []
for elem in range(k):
neighbor.append((distances[elem][0], distances[elem][2]))
return neighbor
def getResponse(neighbors):
classVotes = {}
for x in range(len(neighbors)):
response = int(neighbors[x][-1])
if response in classVotes:
classVotes[response] += 1
else:
classVotes[response] = 1
sortedVotes = sorted(classVotes.items(), key=operator.itemgetter(1), reverse = True)
return sortedVotes[0][0]
"""Prediction"""
predictions = []
k = 4
for x in range(len(X_test)):
neighbors = getNeighbors(X_train, y_train, y_test[x], k)
result = getResponse(neighbors)
predictions.append(result) 返回的错误是:
回溯(最近一次调用): 文件"",第2行,在邻居= getNeighbors(X_train,y_train,y_testx,k)中 文件"C:\ANACONDA\lib\site-packages\pandas\core\frame.py",第1797行,在getitem中返回self._getitem_column(键) 文件"C:\ANACONDA\lib\site-packages\pandas\core\frame.py",第1804行,_getitem_column返回self._get_item_cache(键) 文件"C:\ANACONDA\lib\site-packages\pandas\core\generic.py",第1084行,以_get_item_cache值=self._data.get(项目)为单位 文件"C:\ANACONDA\lib\site-packages\pandas\core\internals.py",第2851行,在get loc = self.items.get_loc(item)中 文件"C:\ANACONDA\lib\site-packages\pandas\core\index.py",第1572行,在get_loc返回self._engine.get_loc(_values_from_object(key))中 文件"pandas\index.pyx",第134行,pandas.index.IndexEngine.get_loc (pandas\index.c:3824) 文件"pandas\index.pyx",第154行,在pandas.index.IndexEngine.get_loc (pandas\index.c:3704)中 文件"pandas\hashtable.pyx",第686行,pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12280) 文件"pandas\hashtable.pyx",第694行,pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12231) KeyError: 0
数据集可以访问这里。
发布于 2017-05-02 10:38:14
编辑:您可能有一个额外的字符在您的csv文件开始。尝试在read_csv()调用中指定编码。参见csv.html中的“编码”
编码: str,默认无编码,用于读/写UTF (例如。‘’utf 8‘)Python标准编码列表:https://docs.python.org/3/library/codecs.html#standard-encodings
当你不需要一个点的时候,你使用的是一个点(在两个地方,我可以从球棒上看到):
operator.itemgetter(1)具体而言,您已经导入了项目管理器:
from operator import itemgetter因此,当您调用itemgetter时,只需调用它而不使用点符号:
itemgetter(1)https://stackoverflow.com/questions/43735722
复制相似问题