如果:param n_jobs:的随机林估计器和多输出回归器本身都有,该如何使用?例如,最好不要为估计器指定n_jobs,而是为多输出回归器设置n_jobs吗?以下是几个例子:
# Imports
from sklearn.ensemble import RandomForestRegressor
from sklearn.multioutput import MultiOutputRegressor
# (1) No parallelization
rf_no_jobs = RandomForestRegressor()
multioutput_no_jobs_alpha = MultiOutputRegressor(estimator=rf_no_jobs)
# (2) RF w/ parallelization, multioutput w/o parallelization
rf_with_jobs = RandomForestRegressor(n_jobs=-1)
multioutput_no_jobs_beta = MultiOutputRegressor(estimator=rf_with_jobs)
# (3) RF w/o parallelization, multioutput w parallelization
multioutput_with_jobs_alpha = MultiOutputRegressor(estimator=rf_no_jobs, n_jobs=-1)
# (4) Both parallelized
multioutput_with_jobs_beta = MultiOutputRegressor(estimator=rf_with_jobs, n_jobs=-1)发布于 2022-01-25 02:18:33
由于RandomForestRegressor具有“本机”多输出支持(不需要多输出包装器),所以我转而查看了KNeighborsRegressor和LightGBM,它们有一个内部的n_jobs参数,而且我也有同样的问题。
运行在Ryzen 5950X (Linux)和Intel 11800 H (Windows)上,两者的n_jobs = 8,我发现一致的结果:
MultiOutputRegressor中,KNN接收n_jobs=1在160个维/目标处的速度为的10倍。with joblib.parallel_backend("loky", n_jobs=your_n_jobs):同样快捷和方便地为内部的所有滑雪板设置n_jobs。这是一个简单的选择。RegressorChain在低维度上足够快,但速度却非常慢(500 X慢速 vs Multioutput),KNeighbors有160个维度(我会坚持使用LightGBM,以便与性能更好的RegressorChain一起使用)。LightGBM中,MultiOutputRegressor只设置n_jobs的速度再次快于内部n_jobs,但两者之间的差异要小得多(5950×Linux,差值为3x,11800HWindows仅为1.2x)。由于完整的代码有点长,下面是一个获得大部分代码的部分示例:
from timeit import default_timer as timer
import numpy as np
from joblib import parallel_backend
from sklearn.neighbors import KNeighborsRegressor
from sklearn.multioutput import MultiOutputRegressor, RegressorChain
from sklearn.datasets import fetch_california_housing
# adjust n_jobs to the number of physical CPU cores on your machine or pass -1 for auto max
n_jobs = 8
knn_model_param_dict = {} # kwargs if desired
num_y_dims = 160
X, y_one_dim = fetch_california_housing(return_X_y=True)
y_one_dim = y_one_dim.reshape(-1, 1)
# extra multioutput dims generated randomly
dims = [y_one_dim]
for _ in range(num_y_dims - 1):
dims.append(np.random.gamma(y_one_dim.std(), size=y_one_dim.shape))
y = np.concatenate(dims, axis=1)
# INIT
regr = MultiOutputRegressor(
KNeighborsRegressor(**knn_model_param_dict),
n_jobs=n_jobs,
).fit(X, y)
trial = "KNN with all n_jobs=1"
start = timer()
regr = MultiOutputRegressor(
KNeighborsRegressor(**knn_model_param_dict, n_jobs=1),
n_jobs=1,
)
regr.fit(X, y)
regr.predict(X)
end = timer()
print(f"trial: {trial} with runtime: {end - start}")
trial = "KNN inner model with n_jobs"
start = timer()
regr = MultiOutputRegressor(
KNeighborsRegressor(**knn_model_param_dict, n_jobs=n_jobs),
n_jobs=1,
)
regr.fit(X, y)
regr.predict(X)
end = timer()
print(f"trial: {trial} with runtime: {end - start}")
trial = "KNN outer multioutput with n_jobs, inner with 1"
start = timer()
regr = MultiOutputRegressor(
KNeighborsRegressor(**knn_model_param_dict, n_jobs=1),
n_jobs=n_jobs,
)
regr.fit(X, y)
regr.predict(X)
end = timer()
print(f"trial: {trial} with runtime: {end - start}")
trial = "KNN inner and outer both -1"
start = timer()
regr = MultiOutputRegressor(
KNeighborsRegressor(**knn_model_param_dict, n_jobs=-1),
n_jobs=-1,
)
regr.fit(X, y)
regr.predict(X)
end = timer()
print(f"trial: {trial} with runtime: {end - start}")
trial = "joblib backend chooses"
start = timer()
with parallel_backend("loky", n_jobs=n_jobs):
regr = MultiOutputRegressor(
KNeighborsRegressor(**knn_model_param_dict),
)
regr.fit(X, y)
regr.predict(X)
end = timer()https://stackoverflow.com/questions/69019181
复制相似问题