在最近开始使用ClearML来管理MLOps之后,我面临着以下问题:当运行一个脚本,使用计算机中不同的类权重在二进制分类问题中训练CatBoost时,它完美地工作,记录结果,完全没有问题。一旦我尝试使用ClearML代理远程运行该程序,它将导致以下错误:
<!-- language: lang-none -->
Traceback (most recent call last):
File "/root/.clearml/venvs-builds/3.9/lib/python3.9/site-packages/clearml/binding/frameworks/catboost_bind.py", line 102, in _fit
return original_fn(obj, *args, **kwargs)
File "/root/.clearml/venvs-builds/3.9/lib/python3.9/site-packages/catboost/core.py", line 5007, in fit
self._fit(X, y, cat_features, text_features, embedding_features, None, sample_weight, None, None, None, None, baseline, use_best_model,
File "/root/.clearml/venvs-builds/3.9/lib/python3.9/site-packages/catboost/core.py", line 2262, in _fit
train_params = self._prepare_train_params(
File "/root/.clearml/venvs-builds/3.9/lib/python3.9/site-packages/catboost/core.py", line 2194, in _prepare_train_params
_check_train_params(params)
File "_catboost.pyx", line 6032, in _catboost._check_train_params
File "_catboost.pyx", line 6051, in _catboost._check_train_params
**_catboost.CatBoostError: catboost/private/libs/options/catboost_options.cpp:607: if loss-function is Logloss, then class weights should be given for 0 and 1 classes
During handling of the above exception, another exception occurred:
Traceback (most recent call last):**
File "/root/.clearml/venvs-builds/3.9/task_repository/RecSys.git/src/cli/model_training_remote.py", line 313, in <module>
rfs.run(
File "/root/.clearml/venvs-builds/3.9/task_repository/RecSys.git/src/cli/model_training_remote.py", line 232, in run
model.fit(
File "/root/.clearml/venvs-builds/3.9/lib/python3.9/site-packages/clearml/binding/frameworks/__init__.py", line 36, in _inner_patch
raise ex
File "/root/.clearml/venvs-builds/3.9/lib/python3.9/site-packages/clearml/binding/frameworks/__init__.py", line 34, in _inner_patch
ret = patched_fn(original_fn, *args, **kwargs)
File "/root/.clearml/venvs-builds/3.9/lib/python3.9/site-packages/clearml/binding/frameworks/catboost_bind.py", line 110, in _fit
return original_fn(obj, *args, **kwargs)
File "/root/.clearml/venvs-builds/3.9/lib/python3.9/site-packages/catboost/core.py", line 5007, in fit
self._fit(X, y, cat_features, text_features, embedding_features, None, sample_weight, None, None, None, None, baseline, use_best_model,
File "/root/.clearml/venvs-builds/3.9/lib/python3.9/site-packages/catboost/core.py", line 2262, in _fit
train_params = self._prepare_train_params(
File "/root/.clearml/venvs-builds/3.9/lib/python3.9/site-packages/catboost/core.py", line 2194, in _prepare_train_params
_check_train_params(params)
File "_catboost.pyx", line 6032, in _catboost._check_train_params
File "_catboost.pyx", line 6051, in _catboost._check_train_params
**_catboost.CatBoostError: catboost/private/libs/options/catboost_options.cpp:607: if loss-function is Logloss, then class weights should be given for 0 and 1 classes**我确实把字典连接起来了:
model_params = {
"loss_function": "Logloss",
"eval_metric": "AUC",
"class_weights": {0: 1, 1: 60},
"learning_rate": 0.1
}在ClearML任务中注册为
task.connect(model_params, 'model_params')并在以下调用中用作模型的参数:
model = CatBoostClassifier(**model_params)在ClearML交互模式下从容器中运行它时,它也工作得很好。
发布于 2022-08-08 20:52:18
免责声明:我是ClearML的团队成员
我想我理解这个问题,基本上我认为问题是:
task.connect(model_params, 'model_params')因为这是一个嵌套的dict:
model_params = {
"loss_function": "Logloss",
"eval_metric": "AUC",
"class_weights": {0: 1, 1: 60},
"learning_rate": 0.1
}class_weights存储为String密钥,但catboost需要int密钥,因此失败。一种选择是删除task.connect(model_params, 'model_params')
另一个解决方案(直到我们修复它)将是这样做:
task.connect(model_params, 'model_params')
model_params["class_weights"] = {
0: model_params["class_weights"].get("0", model_params["class_weights"].get(0))
1: model_params["class_weights"].get("1", model_params["class_weights"].get(1))
}https://stackoverflow.com/questions/73279794
复制相似问题