首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >CatboostRegressor可以接收哪种类型的数据集?

CatboostRegressor可以接收哪种类型的数据集?
EN

Stack Overflow用户
提问于 2021-06-24 18:26:44
回答 1查看 30关注 0票数 1

我一直在尝试一个我感兴趣的代码项目。我的数据集来自外汇市场,有10个特征,超过70000个属性,已经被分成了一个训练和测试集,但我的CatboostRegressor一直给出这个错误。我需要对我的数据集执行什么操作才能使回归器工作?或者是别的什么??

代码语言:javascript
复制
from catboost import CatBoostRegressor, Pool
train_data = Pool(zar_train, label=['bidclose', 'askclose'])

test_data = Pool(zar_test, label=['bidclose', 'askclose'])
eval_data = zar_val
eval_dataset = Pool(eval_data, label=['bidclose', 'askclose'])

model = CatBoostRegressor(learning_rate=0.03,
                           custom_metric=['Logloss',
                                          'AUC:hints=skip_train~false'], score_function='Accuracy')

model.fit(train_data, test_data)

print(model.get_best_score())

错误:

代码语言:javascript
复制
---------------------------------------------------------------------------
CatBoostError                             Traceback (most recent call last)
<ipython-input-8-e0aa9e711bf9> in <module>
      1 from catboost import CatBoostRegressor, Pool
----> 2 train_data = Pool(zar_train, label=['bidclose', 'askclose'])
      3 test_data = Pool(zar_test, label=['bidclose', 'askclose'])
      4 eval_data = zar_val
      5 eval_dataset = Pool(eval_data, label=['bidclose', 'askclose'])

~\anaconda3\lib\site-packages\catboost\core.py in __init__(self, data, label, cat_features, text_features, embedding_features, column_description, pairs, delimiter, has_header, ignore_csv_quoting, weight, group_id, group_weight, subgroup_id, pairs_weight, baseline, feature_names, thread_count, log_cout, log_cerr)
    615                     )
    616 
--> 617                 self._init(data, label, cat_features, text_features, embedding_features, pairs, weight, group_id, group_weight, subgroup_id, pairs_weight, baseline, feature_names, thread_count)
    618         super(Pool, self).__init__()
    619 

~\anaconda3\lib\site-packages\catboost\core.py in _init(self, data, label, cat_features, text_features, embedding_features, pairs, weight, group_id, group_weight, subgroup_id, pairs_weight, baseline, feature_names, thread_count)
   1085             if len(np.shape(label)) == 1:
   1086                 label = np.expand_dims(label, 1)
-> 1087             self._check_label_shape(label, samples_count)
   1088         if feature_names is not None:
   1089             self._check_feature_names(feature_names, features_count)

~\anaconda3\lib\site-packages\catboost\core.py in _check_label_shape(self, label, samples_count)
    730         """
    731         if len(label) != samples_count:
--> 732             raise CatBoostError("Length of label={} and length of data={} is different.".format(len(label), samples_count))
    733 
    734     def _check_baseline_type(self, baseline):

CatBoostError: Length of label=2 and length of data=44908 is different.
EN

回答 1

Stack Overflow用户

发布于 2021-06-24 18:42:47

文档中说data必须是2D数组,label必须是具有相同长度的的一维数组。请参阅here

由于您使用了包含2个值的列表作为label,因此出现错误。

相反,您应该传递整个y_trainy_test值,它们应该等于数据集的长度。

代码语言:javascript
复制
train_data = Pool(zar_train, label=y_train) #length of zar_train and y_train should be equal

test_dataset = Pool(zar_test, label=y_test) #length of test_data and y_test should be equal

eval_dataset = Pool(eval_data, label=y_eval) #length of eval_data and y_eval should be equal
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/68113993

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档