这是我的第一个NLP项目。我试着用SMOTE作为一个有14个类的分类器。在使用SMOTE之前,我需要将类转换为数组。我试过使用MultiLinearBinarizer,但它似乎不起作用。从堆栈跟踪来看,似乎所有的东西都被转换了。我需要将其他东西转换成数组吗?我该怎么做?
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.preprocessing import MultiLabelBinarizer
nb = Pipeline([('vect', CountVectorizer()),
('tfidf', TfidfTransformer()),
('clf', MultinomialNB()),
])
nb.fit(X_train, y_train)
mlb = MultiLabelBinarizer()
print(mlb.fit_transform(df["Technique"].str.split(",")))
print(mlb.classes_)
import imblearn
from imblearn.over_sampling import SMOTE
smote = SMOTE('minority')
x_sm, y_sm = smote.fit_sample(X_train, y_train)
#print(x_sm.shape, y_sm.shape)
pd.DataFrame(x_sm.todense(), columns=tv.get_feature_names())我得到了错误ValueError:无法将字符串转换为浮动:“左中间”
这是堆栈跟踪
[[1 0 0 ... 0 0 0]
[1 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]
...
[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]]
['Appeal_to_Authority' 'Appeal_to_fear-prejudice' 'Bandwagon'
'Black-and-White_Fallacy' 'Causal_Oversimplification' 'Doubt'
'Exaggeration' 'Flag-Waving' 'Labeling' 'Loaded_Language' 'Minimisation'
'Name_Calling' 'Red_Herring' 'Reductio_ad_hitlerum' 'Repetition'
'Slogans' 'Straw_Men' 'Thought-terminating_Cliches' 'Whataboutism']
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-45-68681190410f> in <module>()
10 smote = SMOTE('minority')
11
---> 12 x_sm, y_sm = smote.fit_sample(X_train, y_train)
13 #print(x_sm.shape, y_sm.shape)
14 pd.DataFrame(x_sm.todense(), columns=tv.get_feature_names())
8 frames
/usr/local/lib/python3.6/dist-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
83
84 """
---> 85 return array(a, dtype, copy=False, order=order)
86
87
ValueError: could not convert string to float: 'left left center'
```发布于 2020-03-29 19:28:09
如果没有编码,就不能将X_train安装到y_train中。在功能上尝试这样的方法:
from sklearn.preprocessing import OneHotEncoder
enc = OneHotEncoder(handle_unknown='ignore')
enc.fit(X)标签的编码。
https://datascience.stackexchange.com/questions/71385
复制相似问题