我有不平衡的数据,M的百分比是80%,F的百分比是20%。以下是数据示例:
NAME COUNTRY HEIGHT HANDPHONE TYPE GENDER
NOVI USA 160 samsung SM-G610F F
JOHN JAPAN 181 vivo 1718 M
RICHARD UK 175 samsung SM-G532G M
ANTHONY UK 179 OPPO F1fw M
SAMUEL UK 185 Iphone 8 plus M
BUNGA KOREA 170 Iphone 6s F因此,我希望使用SMOTENC平衡M:F为50%:50%的数据。我尝试过这个脚本:
import numpy as np
import pandas as pd
import scipy.stats as stats
import sklearn
import keras
import imblearn
import matplotlib.pyplot as plt
import seaborn as sns
plt.style.use('ggplot')
df=pd.read_excel('Data for oversampling.xlsx')
Data = df
Data.GENDER.replace({'M':0,'F':1},inplace=True)
sns.countplot('GENDER', data = Data)
y = Data.GENDER
x = Data.drop('GENDER', axis=1)
from imblearn.over_sampling import SMOTENC
smote_nc = SMOTENC(categorical_features=[0,3], random_state=0)
x_resampled, y_resampled = smote_nc.fit_resample(x, y)但我的错误是这样的:
could not convert string to float有人能帮上忙吗?
发布于 2020-03-03 18:30:06
在您的数据集中,所有的特征都是分类的,除了feature 2,它是唯一的非分类的。您需要更新您的categorical_features列表。
https://stackoverflow.com/questions/58214565
复制相似问题