我一直无法找到任何教程、指南或示例代码来执行数据集拆分和平衡,以作为sklearn管道的一部分。这个是可能的吗?
我有这样的事情:
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
### can this be part of the pipeline?
X_train, X_test, y_train, y_test = \
train_test_split(df, df['target'].values, stratify=df['target'].values, test_size=0.7, random_state=42)
###:end can this be part of the pipeline?
pipeline = Pipeline([
# is there a splitter or balancer class that can be added to the pipeline here?
('scaler', StandardScaler()),
('K Nearest Neighbor', KNeighborsClassifier(n_neighbors=4))
])
pipeline.fit(X_train, y_train)有可能有这样的管道来代替吗?
pipeline = Pipeline([
('balancer', Balancer()), # is there some magical Balancer() class somewhere?
('splitter', Splitter()), # is there some magical Splitter() class somewhere?
('scaler', StandardScaler()),
('K Nearest Neighbor', KNeighborsClassifier(n_neighbors=4))
]) 谢谢您抽时间见我
发布于 2021-08-12 10:32:40
No..
Pipeline对象的目的是组装一个固定的处理数据的几个步骤的序列和一个最终的估计器。
然而,Pipeline对象只转换观察到的数据,这通常由X表示。也涉及目标(通常由y表示)的转换不能是管道的一部分。
关于交叉验证的评论,Pipeline实际上是要与估计器一起交叉验证数据处理步骤,而不是作为Pipeline对象本身的一部分:
from sklearn.model_selection import cross_validate
from sklearn.neighbors import KNeighborsClassifier
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
pipeline = Pipeline([
('scaler', StandardScaler()),
('K Nearest Neighbor', KNeighborsClassifier(n_neighbors=4))
])
cv_results = cross_validate(pipeline, X, y, cv=3)https://stackoverflow.com/questions/68750374
复制相似问题