我一次使用CSV文件中的100000行块来训练一个简单的LASSO模型。
我如何组合所有这些模型从这些不同的块训练?我想用所有这些经过训练的模型来预测。
我很熟悉达斯克和其他替代品,但我想使用潘达斯。
pipelines = {
'lasso' : make_pipeline(StandardScaler(), Lasso(random_state=123))
}
for key, value in pipelines.items():
print( key, type(value) )
# Lasso hyperparameters
lasso_hyperparameters = {
'lasso__alpha' : [0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 5, 10]
}
hyperparameters = {
'lasso' : lasso_hyperparameters
}
# Create empty dictionary called fitted_models
fitted_models = {}
# Create cross-validation object from pipeline and hyperparameters
model = GridSearchCV(pipeline, hyperparameters[name], cv=10, n_jobs=-1)
def train(X_train, y_train):
# Fit model on X_train, y_train
model.fit(X_train, y_train)
# Store model in fitted_models[name]
fitted_models[name] = model
# Print '{name} has been fitted'
print(name, 'has been fitted.')
print ("__________________________________")
print (model.cv_results_)
for df in pd.read_csv('train_V2.csv', chunksize=100000):
df = df.dropna()
df = pd.get_dummies(df, columns=['matchType'])
df_train = df.drop(['Id', 'groupId', 'matchId'], axis = 1)
y = df_train.winPlacePerc
X = df_train.drop('winPlacePerc', axis=1)
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2,
random_state=1234)
X_train = np.asarray(X_train)
X_test = np.asarray(X_test)
y_train = np.asarray(y_train)
y_test = np.asarray(y_test)
train(X_train, y_train)发布于 2018-10-20 18:58:07
你正在寻找的是所谓的“随机优化”。您不需要安装单独的模型,然后将它们组合起来。
发布于 2022-08-21 23:50:01
考虑使用sklearn.linear_model.SGDRegressor和L1惩罚,这相当于拉索。
这有一个.partial_fit实现,可以用块数据集增量地训练模型,而不是训练单独的模型。
例如。
for epoch in epochs:
for df in pd.read_csv('train_V2.csv', chunksize=100000):
model.partial_fit(*args)https://datascience.stackexchange.com/questions/39985
复制相似问题