import seaborn as sns
import pandas as pd
import numpy as nm
import matplotlib.pyplot as plt
df=sns.load_dataset('fmri')
x=df[['timepoint','subject']]
y=df['signal']
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_5388\1414350166.py in <module>
----> 1 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
~\Anaconda3\envs\MACHINELEARNING\lib\site-packages\sklearn\model_selection\_split.py in train_test_split(test_size, train_size, random_state, shuffle, stratify, *arrays)
2415 raise ValueError("At least one array required as input")
2416
-> 2417 arrays = indexable(*arrays)
2418
2419 n_samples = _num_samples(arrays[0])
~\Anaconda3\envs\MACHINELEARNING\lib\site-packages\sklearn\utils\validation.py in indexable(*iterables)
376
377 result = [_make_indexable(X) for X in iterables]
--> 378 check_consistent_length(*result)
379 return result
380
~\Anaconda3\envs\MACHINELEARNING\lib\site-packages\sklearn\utils\validation.py in check_consistent_length(*arrays)
332 raise ValueError(
333 "Found input variables with inconsistent numbers of samples: %r"
--> 334 % [int(l) for l in lengths]
335 )
336
ValueError: Found input variables with inconsistent numbers of samples: [53940, 1064]发布于 2023-05-22 11:55:15
除了Nick ODell在关于X未被定义的注释中所指出的,您看到的ValueError是因为在两个不同大小的数组上调用train_test_split。您对train_test_split的X和y输入需要相同的大小,这里分别是53940和1064。换句话说,对于每一个输入,您都需要一个输出。hth。
发布于 2023-05-22 13:00:10
是的,请检查xin x=df[['timepoint','subject']]和X in train_test_split(X...
然而,
df=sns.load_dataset('fmri')这应该是
df=pd.read_csv('fmri') # but I don't know the format for this data set只需确保脚本在与fmri相同的目录下运行
一个小的示例数据集将是很酷的。您正在使用sns和matplotlib进行绘图,但是ins不会加载一个数据集,即pandas
https://datascience.stackexchange.com/questions/121676
复制相似问题