首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Group by with in Group by using dataframe python panda

Group by with in Group by using dataframe python panda
EN

Stack Overflow用户
提问于 2021-07-09 12:35:13
回答 1查看 37关注 0票数 0

对于给定的.csv文件中的一段数据,我希望使用以下规则获得groupby和总量的总和。

请注意,需要从云安全、云计算和云数据服务等类别中组合/汇总一些类别,如“云安全”,以得出“总体云安全”。同样,“区块链和加密货币”是区块链、虚拟货币、比特币和以太网的总和。

我的代码:

代码语言:javascript
复制
tmp = pd.DataFrame({'Categories' : ['Blockchain,Cloud Computing,InformationTechnology,Software', 'Cyber Security,Fraud Detection,Information Technology,Network Security', 'Information Technology,Medical,Security,Virtual Currency,Telecommunications', 'Mobile,Mobile Devices, Security,Bitcoin',
                      'Computer,Cyber Security,Network Security', 'Accounting,Hardware,Security,Software,Cloud data Service', 'Content,Security,Software,Etherium', 'Cyber Security,Enterprise Software,Security'],
               'Amount' : [500, 400, 700, 900, 100, 800, 1000,600]})
print(tmp)



dfc = tmp.groupby(tmp['Categories'])['Amount'].sum().reset_index()
dfc.columns =['Categories', 'Amount']

此代码仅提供所有group by和sum

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-07-09 13:05:53

试试这个:

代码语言:javascript
复制
import pandas as pd
import numpy as np

df = pd.DataFrame({'Categories':['Blockchain, Cloud Computing, Information Technology, Software',
                                'Cyber Security, Fraud Detection, Information Tehnology, Network Security',
                                'Information Technology, Medical, Security, Virtual Currency, Telecommunications',
                                'Mobile, Mobile Devices, Security, Bitcoin',
                                'Computer, Cyber Security, Network Security',
                                'Accounting, Hardware, Security, Software, Cloud data Service', 
                                'Content, Security, Software, Etherium',
                                'Cyber Security, Enterprise Software, Security'],
                  'Amount':[500,400,700,900,100,800,1000,600]})

dfe = df.assign(Cats=df['Categories'].str.split(',\s+?')).explode('Cats')

dd = {'Cyber Security': 'Cloud Security Overall',
      'Cloud Computing' : 'Cloud Security Overall',
      'Cloud data Service' : 'Cloud Security Overall',
      'Blockchain' : 'Blockchain and Cyptocurrencies',
      'Virtual Currency' : 'Blockchain and Cyptocurrencies',
      'Bitcoin' : 'Blockchain and Cyptocurrencies',
      'Etherium' : 'Blockchain and Cyptocurrencies'}

print(dfe.groupby(dfe['Cats'].map(dd))['Amount'].sum())

输出:

代码语言:javascript
复制
Cats
Blockchain and Cyptocurrencies    3100
Cloud Security Overall            2400
Name: Amount, dtype: int64

更新每条评论,你想要吗?

代码语言:javascript
复制
dfe.groupby(dfe['Cats'].replace(dd))['Amount'].sum()

输出:

代码语言:javascript
复制
Cats
Accounting                         800
Blockchain and Cyptocurrencies    3100
Cloud Security Overall            2400
Computer                           100
Content                           1000
Enterprise Software                600
Fraud Detection                    400
Hardware                           800
Information Technology            1200
Information Tehnology              400
Medical                            700
Mobile                             900
Mobile Devices                     900
Network Security                   500
Security                          4000
Software                          2300
Telecommunications                 700
Name: Amount, dtype: int64

代码语言:javascript
复制
dfe.groupby(dfe['Cats'].map(dd).fillna('Rest'))['Amount'].sum()

输出:

代码语言:javascript
复制
Cats
Blockchain and Cyptocurrencies     3100
Cloud Security Overall             2400
Rest                              15300
Name: Amount, dtype: int64
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/68311335

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档