对于给定的.csv文件中的一段数据,我希望使用以下规则获得groupby和总量的总和。
请注意,需要从云安全、云计算和云数据服务等类别中组合/汇总一些类别,如“云安全”,以得出“总体云安全”。同样,“区块链和加密货币”是区块链、虚拟货币、比特币和以太网的总和。


我的代码:
tmp = pd.DataFrame({'Categories' : ['Blockchain,Cloud Computing,InformationTechnology,Software', 'Cyber Security,Fraud Detection,Information Technology,Network Security', 'Information Technology,Medical,Security,Virtual Currency,Telecommunications', 'Mobile,Mobile Devices, Security,Bitcoin',
'Computer,Cyber Security,Network Security', 'Accounting,Hardware,Security,Software,Cloud data Service', 'Content,Security,Software,Etherium', 'Cyber Security,Enterprise Software,Security'],
'Amount' : [500, 400, 700, 900, 100, 800, 1000,600]})
print(tmp)
dfc = tmp.groupby(tmp['Categories'])['Amount'].sum().reset_index()
dfc.columns =['Categories', 'Amount']此代码仅提供所有group by和sum
发布于 2021-07-09 13:05:53
试试这个:
import pandas as pd
import numpy as np
df = pd.DataFrame({'Categories':['Blockchain, Cloud Computing, Information Technology, Software',
'Cyber Security, Fraud Detection, Information Tehnology, Network Security',
'Information Technology, Medical, Security, Virtual Currency, Telecommunications',
'Mobile, Mobile Devices, Security, Bitcoin',
'Computer, Cyber Security, Network Security',
'Accounting, Hardware, Security, Software, Cloud data Service',
'Content, Security, Software, Etherium',
'Cyber Security, Enterprise Software, Security'],
'Amount':[500,400,700,900,100,800,1000,600]})
dfe = df.assign(Cats=df['Categories'].str.split(',\s+?')).explode('Cats')
dd = {'Cyber Security': 'Cloud Security Overall',
'Cloud Computing' : 'Cloud Security Overall',
'Cloud data Service' : 'Cloud Security Overall',
'Blockchain' : 'Blockchain and Cyptocurrencies',
'Virtual Currency' : 'Blockchain and Cyptocurrencies',
'Bitcoin' : 'Blockchain and Cyptocurrencies',
'Etherium' : 'Blockchain and Cyptocurrencies'}
print(dfe.groupby(dfe['Cats'].map(dd))['Amount'].sum())输出:
Cats
Blockchain and Cyptocurrencies 3100
Cloud Security Overall 2400
Name: Amount, dtype: int64更新每条评论,你想要吗?
dfe.groupby(dfe['Cats'].replace(dd))['Amount'].sum()输出:
Cats
Accounting 800
Blockchain and Cyptocurrencies 3100
Cloud Security Overall 2400
Computer 100
Content 1000
Enterprise Software 600
Fraud Detection 400
Hardware 800
Information Technology 1200
Information Tehnology 400
Medical 700
Mobile 900
Mobile Devices 900
Network Security 500
Security 4000
Software 2300
Telecommunications 700
Name: Amount, dtype: int64或
dfe.groupby(dfe['Cats'].map(dd).fillna('Rest'))['Amount'].sum()输出:
Cats
Blockchain and Cyptocurrencies 3100
Cloud Security Overall 2400
Rest 15300
Name: Amount, dtype: int64https://stackoverflow.com/questions/68311335
复制相似问题