首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Dask网格搜索提供错误KeyError: KeyError

Dask网格搜索提供错误KeyError: KeyError
EN

Stack Overflow用户
提问于 2019-09-11 19:13:52
回答 1查看 342关注 0票数 0

我试图为dask做超参数调优,但是得到KeyError:有谁能帮我吗?我无法找到这方面的任何解决方案,涉及到dask或dask的docs。试图实施达斯克的动机是使用熊猫和滑雪花太长时间,而内核大部分时间都会死去。

下面是代码片段:

代码语言:javascript
复制
X_train_dask = dd.from_pandas(X_train, npartitions=10)
X_test_dask = dd.from_pandas(X_test, npartitions=10)
y_train_dask = dd.from_pandas(y_train, npartitions=10)

import dask
from dask.distributed import Client, progress        # task distribution

client = Client()
param_grid = {
    'n_estimators' : [500 , 750 , 1000],
    'max_depth': [15, 25 , 35  , -1],
    'colsample_bytree' : [ 0.7 , 0.9],
    'gamma' : [0.2 , 0.3 , 0.5],
    'subsample' : [ 0.7 , 0.8 , 0.9] ,
    'alpha' : [3, 4, 5],
    'learning_rate' : [0.05, 0.01, 0.008]
}
import dask_xgboost as dxgb
from dask_ml.model_selection import GridSearchCV as GSV
from sklearn.model_selection import GroupKFold
clf = dxgb.XGBClassifier(random_state= 100,missing = -999) 
skf = GroupKFold(n_splits=3)
grid_search = GSV(clf, param_grid, scoring='roc_auc', refit=True ,
                           cv=skf, return_train_score=True)
grid_search.fit(X_train_dask.to_dask_array(), y_train_dask.to_dask_array()) 
print(grid_search.best_params_)
print(grid_search.best_score_ )

以下是回溯:

代码语言:javascript
复制
distributed.nanny - WARNING - Worker exceeded 95% memory budget. Restarting
distributed.nanny - WARNING - Restarting worker
distributed.nanny - WARNING - Worker exceeded 95% memory budget. Restarting
distributed.nanny - WARNING - Restarting worker
distributed.nanny - WARNING - Worker exceeded 95% memory budget. Restarting
distributed.nanny - WARNING - Restarting worker
distributed.nanny - WARNING - Worker exceeded 95% memory budget. Restarting
distributed.nanny - WARNING - Restarting worker

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-38-3594cb1cb545> in <module>
      6 grid_search = GSV(clf, param_grid, scoring='roc_auc', refit=True ,
      7                            cv=skf, return_train_score=True)
----> 8 grid_search.fit(X_train_dask.to_dask_array(), y_train_dask.to_dask_array())
      9 print(grid_search.best_params_)
     10 print(grid_search.best_score_ )

/opt/conda/lib/python3.6/site-packages/dask_ml/model_selection/_search.py in fit(self, X, y, groups, **fit_params)
   1255                     else:
   1256                         logger.warning("{} has failed... retrying".format(future.key))
-> 1257                         future.retry()
   1258                         ac.add(future)
   1259 

/opt/conda/lib/python3.6/site-packages/distributed/client.py in retry(self, **kwargs)
    308         Client.retry
    309         """
--> 310         return self.client.retry([self], **kwargs)
    311 
    312     def cancelled(self):

/opt/conda/lib/python3.6/site-packages/distributed/client.py in retry(self, futures, asynchronous)
   2058         futures: list of Futures
   2059         """
-> 2060         return self.sync(self._retry, futures, asynchronous=asynchronous)
   2061 
   2062     @gen.coroutine

/opt/conda/lib/python3.6/site-packages/distributed/client.py in sync(self, func, asynchronous, callback_timeout, *args, **kwargs)
    751         else:
    752             return sync(
--> 753                 self.loop, func, *args, callback_timeout=callback_timeout, **kwargs
    754             )
    755 

/opt/conda/lib/python3.6/site-packages/distributed/utils.py in sync(loop, func, callback_timeout, *args, **kwargs)
    326             e.wait(10)
    327     if error[0]:
--> 328         six.reraise(*error[0])
    329     else:
    330         return result[0]

/opt/conda/lib/python3.6/site-packages/six.py in reraise(tp, value, tb)
    691             if value.__traceback__ is not tb:
    692                 raise value.with_traceback(tb)
--> 693             raise value
    694         finally:
    695             value = None

/opt/conda/lib/python3.6/site-packages/distributed/utils.py in f()
    311             if callback_timeout is not None:
    312                 future = gen.with_timeout(timedelta(seconds=callback_timeout), future)
--> 313             result[0] = yield future
    314         except Exception as exc:
    315             error[0] = sys.exc_info()

/opt/conda/lib/python3.6/site-packages/tornado/gen.py in run(self)
   1097 
   1098                     try:
-> 1099                         value = future.result()
   1100                     except Exception:
   1101                         self.had_exception = True

/opt/conda/lib/python3.6/site-packages/distributed/client.py in _retry(self, futures)
   2047         response = await self.scheduler.retry(keys=keys, client=self.id)
   2048         for key in response:
-> 2049             st = self.futures[key]
   2050             st.retry()
   2051 

KeyError: 'finalize-c53697b4-1572-4d19-af41-c76b531699b9'
EN

回答 1

Stack Overflow用户

发布于 2019-09-11 22:33:01

看起来您的集群内存耗尽了。

代码语言:javascript
复制
distributed.nanny - WARNING - Worker exceeded 95% memory budget. Restarting

请注意,XGBoost需要所有的数据来适应内存,以便有效地进行培训。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/57895506

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档