文章/答案/技术大牛

发布

社区首页 >问答首页 >CatboostRegressor与StratifiedKFold的值误差

问CatboostRegressor与StratifiedKFold的值误差
EN

Stack Overflow用户

提问于 2019-10-24 19:13:03

回答 2查看 970关注 0票数 1

我刚开始学习Catboost，并尝试将CatboostRegressor与StratifiedKFold结合使用，但遇到了错误：

这是编辑后的帖子，上面有完整的代码块和需要澄清的错误。此外，在枚举(train_index(X，y))中也尝试了for (train_index，test_index)：没有工作。

from sklearn.model_selection import KFold,StratifiedKFold
from sklearn.metrics import mean_squared_log_error
from sklearn.preprocessing import LabelEncoder
from catboost import Pool, CatBoostRegressor
fold=StratifiedKFold(n_splits=5,shuffle=True,random_state=42)

err = []
y_pred = []
for train_index, test_index in fold.split(X,y):
#for i, (train_index, test_index) in enumerate(fold.split(X,y)):
    X_train, X_val = X.iloc[train_index], X.iloc[test_index]
    y_train, y_val = y[train_index], y[test_index]
    _train = Pool(X_train, label = y_train)
    _valid = Pool(X_val, label = y_val)

    cb = CatBoostRegressor(n_estimators = 20000, 
                     reg_lambda = 1.0,
                     eval_metric = 'RMSE',
                     random_seed = 42,
                     learning_rate = 0.01,
                     od_type = "Iter",
                     early_stopping_rounds = 2000,
                     depth = 7,
                     cat_features = cate,
                     bagging_temperature = 1.0)
    cb.fit(_train,cat_features=cate,eval_set = _valid, early_stopping_rounds = 2000, use_best_model = True, verbose_eval = 100) 

    p = cb.predict(X_val)
    print("err: ",rmsle(y_val,p))
    err.append(rmsle(y_val,p))
    pred = cb.predict(test_df)
    y_pred.append(pred)
predictions = np.mean(y_pred,0)

ValueError                                Traceback (most recent call last)
<ipython-input-21-3a0df0c7b8d6> in <module>()
      7 err = []
      8 y_pred = []
----> 9 for train_index, test_index in fold.split(X,y):
     10 #for i, (train_index, test_index) in enumerate(fold.split(X,y)):
     11     X_train, X_val = X.iloc[train_index], X.iloc[test_index]

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-    packages/sklearn/model_selection/_split.py in split(self, X, y, groups)
    333                 .format(self.n_splits, n_samples))
    334 
--> 335         for train, test in super().split(X, y, groups):
    336             yield train, test
    337 

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-   packages/sklearn/model_selection/_split.py in split(self, X, y, groups)
     87         X, y, groups = indexable(X, y, groups)
     88         indices = np.arange(_num_samples(X))
---> 89         for test_index in self._iter_test_masks(X, y, groups):
     90             train_index = indices[np.logical_not(test_index)]
     91             test_index = indices[test_index]

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/sklearn/model_selection/_split.py in _iter_test_masks(self, X, y, groups)
    684 
    685     def _iter_test_masks(self, X, y=None, groups=None):
--> 686         test_folds = self._make_test_folds(X, y)
    687         for i in range(self.n_splits):
    688             yield test_folds == i

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/sklearn/model_selection/_split.py in _make_test_folds(self, X, y)
    639             raise ValueError(
    640                 'Supported target types are: {}. Got {!r instead.'.format(
--> 641                     allowed_target_types, type_of_target_y))
    642 
    643         y = column_or_1d(y)

ValueError: Supported target types are: ('binary', 'multiclass'). Got 'continuous' instead.

python

machine-learning

scikit-learn

regression

catboost

回答 2

Stack Overflow用户

回答已采纳

发布于 2019-10-24 23:02:55

您从基本ML理论中得到了一个非常基本的原因:只为分类定义分层，以确保拆分中所有类的平等表示；这是回归中的无意义的。仔细阅读错误信息，您应该能够说服自己，它的含义是不支持'continous'目标(即回归)，只支持'binary'或'multiclass' (即分类)；这不是scikit的一些特性--学习，而是一个根本的问题。

文档中也包含了一个相关的提示(重点是添加的)：

分层K-Folds交叉验证器提供列车/测试指标，以分割列车/测试集中的数据。这个交叉验证对象是KFold的一个变体，它返回分层褶皱.这些褶皱是通过保留每个类的样本百分比来实现的。

下面是一个简短的演示，从文档中修改示例，但将目标y更改为连续(回归)而不是离散(分类)：

import numpy as np
from sklearn.model_selection import StratifiedKFold
X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
y = np.array([0.1, 0.5, -1.1, 1.2]) # continuous targets, i.e. regression problem
skf = StratifiedKFold(n_splits=2)

for train_index, test_index in skf.split(X,y):
    print("something")
[...]
ValueError: Supported target types are: ('binary', 'multiclass'). Got 'continuous' instead.

因此，简单地说，您不能在您的(回归)设置中实际使用StratifiedKFold；将其更改为简单的KFold并从那里继续.

票数 1

Stack Overflow用户

发布于 2022-08-18 07:12:22

现在可以做了，搜索StratifiedKFoldReg

看看这个：?usp=sharing

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/58547797

复制

相似问题

问CatboostRegressor与StratifiedKFold的值误差
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问CatboostRegressor与StratifiedKFold的值误差EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问CatboostRegressor与StratifiedKFold的值误差
EN