首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何将SMOTENC应用于具有对象和数字列的数据框架?

如何将SMOTENC应用于具有对象和数字列的数据框架?
EN

Stack Overflow用户
提问于 2020-06-22 01:08:51
回答 1查看 502关注 0票数 2
代码语言:javascript
复制
> In: data.dtypes

Out: Organization Name                                 object
Money Raised Currency (in USD)                   float64
Announced Date                            datetime64[ns]
Total Funding Amount Currency (in USD)           float64
Organization Description                          object
Organization Location                             object
Raised Series A                                    int64
Primary Industry                                  object
Sub_Ind                                           object
Sub_Ind2                                          object
Sub_Ind3                                          object
Sub_Ind4                                          object
Sub_Ind5                                          object
Sub_Ind6                                          object
Sub_Ind7                                          object
Investor1                                         object
Investor2                                         object
Investor3                                         object
Investor4                                         object
Investor5                                         object
Investor6                                         object
Investor7                                         object
Investor8                                         object
Investor9                                         object
Investor10                                        object
Investor11                                        object

> In: x = data.drop(columns=['Raised Series A', 'Announced Date'])

> In: y = data['Raised Series A']

> In: from imblearn.over_sampling import SMOTENC

> In: smote_nc = SMOTENC(categorical_features=[0,1,3,4,5,7,8,9,10,11,12,13,14,15,16,17,
18,19,20,21,22,23,24], random_state=0)

> In: x_resampled, y_resampled = smote_nc.fit_resample(x, y)

  ---------------------------------------------------------------------------
Out: ValueError                                Traceback (most recent call last)
 in 
----> 1 x_resampled, y_resampled = smote_nc.fit_resample(x, y)

~/opt/anaconda3/envs/unit2/lib/python3.7/site-packages/imblearn/base.py in fit_resample(self, X, y)
     81         )
     82 
---> 83         output = self._fit_resample(X, y)
     84 
     85         y_ = (label_binarize(output[1], np.unique(y))

~/opt/anaconda3/envs/unit2/lib/python3.7/site-packages/imblearn/over_sampling/_smote.py in _fit_resample(self, X, y)
    936     def _fit_resample(self, X, y):
    937         self.n_features_ = X.shape[1]
--> 938         self._validate_estimator()
    939 
    940         # compute the median of the standard deviation of the minority class

~/opt/anaconda3/envs/unit2/lib/python3.7/site-packages/imblearn/over_sampling/_smote.py in _validate_estimator(self)
    921                 raise ValueError(
    922                     "Some of the categorical indices are out of range. Indices"
--> 923                     " should be between 0 and {}".format(self.n_features_)
    924                 )
    925             self.categorical_features_ = categorical_features

ValueError: Some of the categorical indices are out of range. Indices should be between 0 and 24

我一直在尝试将列组合到categorical_features参数中,但它们都没有工作。我的数据声誉中也没有空值。我之所以使用Smotenc是因为我的目标载体是极不正确的: 99.7%是是和.3%否。请帮帮忙。

EN

回答 1

Stack Overflow用户

发布于 2022-08-01 22:24:22

我也有同样的问题。改变您做categorical_features的方式,并列出一个布尔列表,以确定它是否是绝对的。

试试这个:

代码语言:javascript
复制
cat_cols = []
for col in x.columns:
    if x[col].dtype == 'object': 
        cat_cols.append(True)
    else:
        cat_cols.append(False)

然后:

代码语言:javascript
复制
smote_nc = SMOTENC(categorical_features=cat_cols, random_state=0)
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/62506114

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档