我在Google上训练了一个回归TPOT算法,其中TPOT过程的输出是一些锅炉板Python代码,如下所示。
import numpy as np
import pandas as pd
from sklearn.ensemble import ExtraTreesRegressor
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline, make_union
from tpot.builtins import StackingEstimator
from tpot.export_utils import set_param_recursive
# NOTE: Make sure that the outcome column is labeled 'target' in the data file
tpot_data = pd.read_csv('PATH/TO/DATA/FILE', sep='COLUMN_SEPARATOR', dtype=np.float64)
features = tpot_data.drop('target', axis=1)
training_features, testing_features, training_target, testing_target = \
train_test_split(features, tpot_data['target'], random_state=1)
# Average CV score on the training set was: -4.881434802676966
exported_pipeline = make_pipeline(
StackingEstimator(estimator=ExtraTreesRegressor(bootstrap=False, max_features=0.9000000000000001, min_samples_leaf=1, min_samples_split=20, n_estimators=100)),
ExtraTreesRegressor(bootstrap=True, max_features=0.9000000000000001, min_samples_leaf=6, min_samples_split=13, n_estimators=100)
)
# Fix random state for all the steps in exported pipeline
set_param_recursive(exported_pipeline.steps, 'random_state', 1)
exported_pipeline.fit(training_features, training_target)
results = exported_pipeline.predict(testing_features)有人知道Sklearn管道过程是什么样子的吗?当我改进锅炉板代码并使用IPython中的数据集运行它时,我可以看到管道过程中的输出,这些都在做什么?
Pipeline(steps=[('stackingestimator-1',
StackingEstimator(estimator=ExtraTreesRegressor(max_features=0.6500000000000001,
min_samples_leaf=19,
min_samples_split=14,
random_state=1))),
('maxabsscaler', MaxAbsScaler()),
('stackingestimator-2',
StackingEstimator(estimator=ExtraTreesRegressor(max_features=0.4,
min_samples_leaf=3,
min_samples_split=7,
random_state=1))),
('adaboostregressor',
AdaBoostRegressor(learning_rate=0.001, loss='exponential',
n_estimators=100, random_state=1))])结果看起来很好,只是好奇的管道处理如何工作,任何提示或链接到教程非常赞赏。我认为这个机械掌握教程对任何有兴趣了解更多关于TPOT的人也有一定的帮助。
发布于 2021-05-24 15:28:33
https://datascience.stackexchange.com/questions/94842
复制相似问题