我目前正在使用scikit学习模块,以帮助解决犯罪预测问题。我有一个问题批次编码整个Dataframe与knn.predict方法。
如何用knn.predict()方法对我的Dataframe的整个两列进行批处理,以便将输出存储在另一个Dataframe中?
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
knn_df = pd.read_csv("/Users/helenapunset/Desktop/knn_dataframe.csv")
# x is the set of features
x = knn_df[['latitude', 'longitude']]
# y is the target variable
y = knn_df['Class']
# train and test data
x_train, x_test, y_train, y_test = train_test_split(x, y, random_state=0)
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors = 5)
# training the data
knn.fit(x_train,y_train)
# test score was approximately 69%
knn.score(x_test,y_test)
# this is predicted to be a safe zone
crime_prediction = knn.predict([[25.787882, -80.358427]])
print(crime_prediction)在代码的最后一行中,我从标记为knn_df的Dataframe中添加了我使用的两个特性,即纬度和经度。但是,这是我一直在搜索的文档中关于简化整个Dataframe的knn预测的过程的一个单一点,并且似乎找不到一个方法来做到这一点。在某种程度上,是否有可能为此使用for循环?
发布于 2022-03-08 03:29:35
让要预测的新集合是'knn_df_predict‘。假设列名相同,请尝试以下代码行:
x_new = knn_df_predict[['latitude', 'longitude']] #formating features
crime_prediction = knn.predict(x_new) #predicting for the new set
knn_df_predict['prediction'] = crime_prediction #Adding the prediction to dataframehttps://stackoverflow.com/questions/71389535
复制相似问题