我有数据帧的车。它的结构描述如下:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 91313 entries, 0 to 93099
Data columns (total 16 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Manufacturer 91313 non-null string
1 Model 91313 non-null string
2 Year 91313 non-null Int64
3 Category 91313 non-null string
4 Mileage 91313 non-null Int64
5 FuelType 91313 non-null string
6 EngineVolume 91313 non-null float64
7 DriveWheels 91313 non-null string
8 GearBox 91313 non-null string
9 Doors 91313 non-null string
10 Wheel 91313 non-null string
11 Color 91313 non-null string
12 InteriorColor 91313 non-null string
13 LeatherInterior 91313 non-null boolean
14 Price 91313 non-null Int64
15 Clearance 91313 non-null boolean
dtypes: Int64(3), boolean(2), float64(1), string(10)
memory usage: 11.1 MB我想做一个使用CatBoostRegressor来预测汽车价格的模型。我试着这样做:
train_dataset = cb.Pool(X_train, y_train)
test_dataset = cb.Pool(X_test, y_test)
cat_features = ['Manufacturer','Model','Category','FuelType','DriveWheels','GearBox','Doors','Wheel','Color','InteriorColor','LeatherInterior','Clearance']
model = cb.CatBoostRegressor(loss_function = 'RMSE',eval_metric = 'R2',cat_features = cat_features)
grid = {'iterations': [250, 300, 400],
'learning_rate': [0.1,0.2],
'depth': [2, 4, 6, 8],
'l2_leaf_reg': [0.2, 0.5, 1, 3],
'cat_features' : cat_features
}
model.grid_search(grid, train_dataset)我也尝试将cat_features放入模型和网格中。但这两种情况都没有帮助。
TypeError Traceback (most recent call last)
<ipython-input-34-2cb43214da9d> in <module>
----> 1 train_dataset = cb.Pool(X_train, y_train)
2 test_dataset = cb.Pool(X_test, y_test)
3 cat_features = ['Manufacturer','Model','Category','FuelType','DriveWheels','GearBox','Doors','Wheel','Color','InteriorColor','LeatherInterior','Clearance']
4 model = cb.CatBoostRegressor(loss_function = 'RMSE',eval_metric = 'R2',cat_features = cat_features)
5 grid = {'iterations': [250, 300, 400],
~\anaconda3\lib\site-packages\catboost\core.py in __init__(self, data, label, cat_features, text_features, embedding_features, column_description, pairs, delimiter, has_header, ignore_csv_quoting, weight, group_id, group_weight, subgroup_id, pairs_weight, baseline, feature_names, thread_count)
586 )
587
--> 588 self._init(data, label, cat_features, text_features, embedding_features, pairs, weight, group_id, group_weight, subgroup_id, pairs_weight, baseline, feature_names, thread_count)
589 super(Pool, self).__init__()
590
~\anaconda3\lib\site-packages\catboost\core.py in _init(self, data, label, cat_features, text_features, embedding_features, pairs, weight, group_id, group_weight, subgroup_id, pairs_weight, baseline, feature_names, thread_count)
1100 baseline = np.reshape(baseline, (samples_count, -1))
1101 self._check_baseline_shape(baseline, samples_count)
-> 1102 self._init_pool(data, label, cat_features, text_features, embedding_features, pairs, weight, group_id, group_weight, subgroup_id, pairs_weight, baseline, feature_names, thread_count)
1103
1104
_catboost.pyx in _catboost._PoolBase._init_pool()
_catboost.pyx in _catboost._PoolBase._init_pool()
_catboost.pyx in _catboost._PoolBase._init_features_order_layout_pool()
_catboost.pyx in _catboost._set_features_order_data_pd_data_frame()
TypeError: Cannot convert StringArray to numpy.ndarray我该如何处理这个错误?
发布于 2021-05-05 02:34:55
如果使用cat_features中的特性名称,则还必须在features_name参数中提供它们。否则,在cat_features中提供分类特征的索引就足够了。
在您的情况下,这将是:
cat_features = [0, 1, 3, 5, 7, 8, 9, 10, 11, 12, 13, 15]https://stackoverflow.com/questions/67352110
复制相似问题