我有一个pandas df,我需要按一个文本字符串的column变量进行排序。我尝试了三种方法。前两个是相似的。最后一种方法是排序,但它也会产生一个神秘的列。
下面是一个小的测试数据集:
raw_corpus #test data
unique_ID count trigger_channel_cat
0 11530 1 Photo and Video
1 17176 1 Environment Control and Monitoring
2 6984 1 Security and Monitoring Systems
3 15696 1 Photo and Video
4 16103 3 Finance and Payments
5 18534 5 News and Information
6 11677 331 Social Networks
7 702 1 Contacts
8 7251 1 Business Tools
9 10609 1 Photo and Video
10 1703 2 Blogging
11 20567 1 Social Networks
12 8357 1 Social Networks
13 4313 1 Fitness and Wearables
14 8552 1 Contacts
15 7634 1 News and Information
16 13698 1 Social Networks
17 13940 4 Business Tools
18 19784 3 Location
19 3561 1 Task Management and To-Dos使用value_counts不起作用:
raw_corpus_sorted=raw_corpus['trigger_channel_cat'].value_counts().index.tolist()
raw_corpus_sorted
['Social Networks',
'Photo and Video',
'Business Tools',
'Contacts',
'News and Information',
'Fitness and Wearables',
'Location',
'Security and Monitoring Systems',
'Task Management and To-Dos',
'Environment Control and Monitoring',
'Blogging',
'Finance and Payments']通过不同的value_counts调用重试,为每个类别提供正确的实例数量,但不对类别进行排序:
raw_corpus_sorted=raw_corpus['trigger_channel_cat'].value_counts(sort=True)
raw_corpus_sorted
Social Networks 4
Photo and Video 3
Business Tools 2
Contacts 2
News and Information 2
Fitness and Wearables 1
Location 1
Security and Monitoring Systems 1
Task Management and To-Dos 1
Environment Control and Monitoring 1
Blogging 1
Finance and Payments 1
Name: trigger_channel_cat, dtype: int64使用sort_values()可以进行排序!但是ints的第一列是什么?
#this one works - but what is that first column?
raw_corpus_sorted=raw_corpus['trigger_channel_cat'].sort_values()
raw_corpus_sorted
10 Blogging
17 Business Tools
8 Business Tools
14 Contacts
7 Contacts
1 Environment Control and Monitoring
4 Finance and Payments
13 Fitness and Wearables
18 Location
15 News and Information
5 News and Information
0 Photo and Video
9 Photo and Video
3 Photo and Video
2 Security and Monitoring Systems
11 Social Networks
6 Social Networks
16 Social Networks
12 Social Networks
19 Task Management and To-Dos
Name: trigger_channel_cat, dtype: object发布于 2018-03-26 08:24:35
当您调用sort_values时,需要添加()并传递目标列以在末尾排序
raw_corpus_sorted=raw_corpus.sort_values('trigger_channel_clean')由于您添加了数据
df.sort_values(' trigger_channel_cat')
Out[1086]:
unique_ID count trigger_channel_cat
10 1703 2 Blogging
17 13940 4 Business Tools
8 7251 1 Business Tools
14 8552 1 Contacts
1 17176 1 Environment Control and
4 16103 3 Finance and Payments
13 4313 1 Fitness and Wearables
18 19784 3 Location
15 7634 1 News and Information
5 18534 5 News and Information
0 11530 1 Photo and Video
9 10609 1 Photo and Video
3 15696 1 Photo and Video
2 6984 1 Security and Monitoring
12 8357 1 Social Networks
6 11677 331 Social Networks
16 13698 1 Social Networks
11 20567 1 Social Networks
19 3561 1 Task Management and To-
7 702 1 acts对于value_counts,您可以使用sort_index
df['trigger_channel_cat'].value_counts(sort=True).sort_index()
Out[1088]:
Blogging 1
Business Tools 2
Contacts 1
Environment Control and 1
Finance and Payments 1
Fitness and Wearables 1
Location 1
News and Information 2
Photo and Video 3
Security and Monitoring 1
Social Networks 4
Task Management and To- 1
acts 1
Name: trigger_channel_cat, dtype: int64https://stackoverflow.com/questions/49482294
复制相似问题