首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何将分类数据放入垃圾箱

如何将分类数据放入垃圾箱
EN

Stack Overflow用户
提问于 2020-03-15 09:04:42
回答 2查看 66关注 0票数 1

我有以下分类数据:

代码语言:javascript
复制
['Self employed', 'Government Dependent',
 'Formally employed Private', 'Informally employed',
 'Formally employed Government', 'Farming and Fishing',
 'Remittance Dependent', 'Other Income',
 'Don't Know/Refuse to answer', 'No Income']

我怎么把它们放在垃圾桶里这样:

代码语言:javascript
复制
 ['Government Dependent','Formally employed Government','Formally 
  employed Private'] = 0

 ['Remittance Dependent', 'Informally employed','Self employed','Other Income'] = 1
 ['Dont Know/Refuse to answer', 'No Income','Farming and Fishing'] = 2

我已经知道如何将数字数据放入分类bins....can中了,倒行逆施吗?

代码语言:javascript
复制
TRAIN = pd.read_csv("Train_v2.csv")
TRAIN['job_type'].unique()
output:
array(['Self employed', 'Government Dependent',
       'Formally employed Private', 'Informally employed',
       'Formally employed Government', 'Farming and Fishing',
       'Remittance Dependent', 'Other Income',
       'Dont Know/Refuse to answer', 'No Income'], dtype=object)
EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2020-03-15 09:11:07

先创建字典,然后通过交换更改字典,最后使用Series.map

代码语言:javascript
复制
a = ['Self employed', 'Government Dependent',
       'Formally employed Private', 'Informally employed',
       'Formally employed Government', 'Farming and Fishing',
       'Remittance Dependent', 'Other Income',
       'Dont Know/Refuse to answer', 'No Income']

TRAIN = pd.DataFrame({'job_type':a})

代码语言:javascript
复制
#add another groups to dict
d = {0: ['Government Dependent','Formally employed Government','Formally employed Private'],
     1: ['Remittance Dependent', 'Informally employed'],
     2: ["Don't Know/Refuse to answer", 'No Income']}

#swap key values in dict
#http://stackoverflow.com/a/31674731/2901002
d1 = {k: oldk for oldk, oldv in d.items() for k in oldv}
TRAIN['new'] = TRAIN['job_type'].map(d1)
print (TRAIN)
                       job_type  new
0                 Self employed  NaN
1          Government Dependent  0.0
2     Formally employed Private  0.0
3           Informally employed  1.0
4  Formally employed Government  0.0
5           Farming and Fishing  NaN
6          Remittance Dependent  1.0
7                  Other Income  NaN
8    Dont Know/Refuse to answer  NaN
9                     No Income  2.0

如果只有01NaN的输出也可以运行numpy.select,但是如果很多组都很复杂且缓慢:

代码语言:javascript
复制
m1 = TRAIN['job_type'].isin(['Government Dependent','Formally employed Government','Formally employed Private'])
m2 = TRAIN['job_type'].isin(['Remittance Dependent', 'Informally employed'])
m3 = TRAIN['job_type'].isin(["Don't Know/Refuse to answer", 'No Income'])
TRAIN['new'] = np.select([m1, m2, m3], [0, 1, 2], np.nan)
票数 2
EN

Stack Overflow用户

发布于 2020-03-15 09:26:56

如果np.where不属于类别0或1或2,则可以执行np.nan并使其成为值。

代码语言:javascript
复制
list_0 = ['Government Dependent','Formally employed Government','Formally employed Private']
list_1 = ['Remittance Dependent', 'Informally employed']
list_2 = ['Don't Know/Refuse to answer', 'No Income']
TRAIN['job_type_bin'] = np.where(TRAIN['job_type'].isin(list_0), 0, np.nan)
TRAIN['job_type_bin'] = np.where(TRAIN['job_type'].isin(list_1), 1, np.nan)
TRAIN['job_type_bin'] = np.where(TRAIN['job_type'].isin(list_1), 2, np.nan)
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/60691261

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档