我试图在一个简单的线性回归模型上进行交叉验证(具体的LOOCV),但是由于某种原因,在计算过程的分数时,我得到了所有条目的nan。有人知道为什么吗?
以下是代码:
#use sklearn
from sklearn import model_selection
from sklearn.model_selection import KFold
#now using sklearn repeat linear regression with sklearn
from sklearn.linear_model import LinearRegression
lr = LinearRegression()
X = np.array(auto['horsepower']).reshape(-1,1)
y = np.array(auto['mpg']).reshape(-1,1)
cv = model_selection.cross_val_score(lr,X,y,cv=len(X))以下是数据:
mpg cylinders displacement horsepower weight acceleration year origin name
0 18.0 8 307.0 130 3504 12.0 70 1 chevrolet chevelle malibu
1 15.0 8 350.0 165 3693 11.5 70 1 buick skylark 320
2 18.0 8 318.0 150 3436 11.0 70 1 plymouth satellite
3 16.0 8 304.0 150 3433 12.0 70 1 amc rebel sst
4 17.0 8 302.0 140 3449 10.5 70 1 ford torino
... ... ... ... ... ... ... ... ... ...
387 27.0 4 140.0 86 2790 15.6 82 1 ford mustang gl
388 44.0 4 97.0 52 2130 24.6 82 2 vw pickup
389 32.0 4 135.0 84 2295 11.6 82 1 dodge rampage
390 28.0 4 120.0 79 2625 18.6 82 1 ford ranger
391 31.0 4 119.0 82 2720 19.4 82 1 chevy s-10
392 rows × 9 columns
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 392 entries, 0 to 391
Data columns (total 9 columns):
mpg 392 non-null float64
cylinders 392 non-null int64
displacement 392 non-null float64
horsepower 392 non-null int64
weight 392 non-null int64
acceleration 392 non-null float64
year 392 non-null int64
origin 392 non-null int64
name 392 non-null object
dtypes: float64(3), int64(5), object(1)
memory usage: 27.7+ KB发布于 2020-03-28 18:23:42
如果你读了得分的小插曲
得分:字符串,可调用,列表/元组,或无,默认:无.如果没有,则使用估计量的得分法。
对于LinearRegression(),这是预测的R^2。但是当n=1时,R^2没有意义。
import pandas as pd
from sklearn import model_selection
from sklearn.model_selection import KFold
from sklearn.metrics import mean_squared_error
from sklearn.linear_model import LinearRegression
auto = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data",
delimiter=r"\s+",header=None,
names=["mpg","cylinders","displacement","horsepower","weight",
"acceleration","model year","origin","car name"],
na_values=['?'])
lr = LinearRegression()
X = np.array(auto['horsepower']).reshape(-1,1)
y = np.array(auto['mpg']).reshape(-1,1)
model_selection.cross_val_score(lr,X,y,cv=len(X),scoring='neg_mean_squared_error')https://stackoverflow.com/questions/60901856
复制相似问题