我正在学习用进行的Python数据分析
我有一个销售数据框架的游戏,看起来是这样的:
(这些数据不是真实的,只是出于提问的目的)
Name Year Publisher Total Sales
GTA V 2013 Rockstar 133000
Super Mario Bros 1985 Nintendo 430500
GTA VI 2025 Rockstar 86000
RDR 3 2025 Rockstar 129030
Super Mario Sister 1985 Nintendo 308900
Super Mario End 2000 Nintendo 112100然后删除名称,并使用以下命令按Publisher名称对其进行分组:
df.drop(columns='Name', inplace=True)
df.groupby(['Publisher','Year','Total Sales']).sum().reset_index()dataframe现在看起来如下所示:
Publisher Year Total Sales
Nintendo 1985 308900
Nintendo 1985 430500
Nintendo 2000 112100
Rockstar 2013 133000
Rockstar 2025 129030
Rockstar 2025 86000这是好的,但我想把同一出版商同一年的总销售额相加
我想让数据文件看起来像这样:
Publisher Year Total Sales
Nintendo 1985 739400
Nintendo 2000 86000
Rockstar 2013 129030
Rockstar 2025 215030有办法这样做吗?
以下是我的df代码:
data = {'Name':['GTA V','Super Mario Bros','GTA VI','RDR 3','Super Mario Sister','Super Mario End'],'Year':['2013','1985','2025','2025','1985','2000'],
'Publisher':['Rockstar','Nintendo','Rockstar','Rockstar','Nintendo','Nintendo'],'Total Sales':['133000','430500','86000','129030','308900','112100']}
df = pd.DataFrame(data)
df发布于 2022-01-03 15:02:29
使用pivot_table
>>> df.pivot_table('Total Sales', ['Year', 'Publisher'], aggfunc='sum').reset_index()
Year Publisher Total Sales
0 1985 Nintendo 739400
1 2000 Nintendo 112100
2 2013 Rockstar 133000
3 2025 Rockstar 215030注意:如果Total Sales列包含字符串,则将其转换为int (或float):
>>> df.astype({'Total Sales': int}).pivot_table(...)发布于 2022-01-03 15:09:45
import pandas as pd
data = {'Name':['GTA V','Super Mario Bros','GTA VI','RDR 3','Super Mario Sister','Super Mario End'],'Year':['2013','1985','2025','2025','1985','2000'],
'Publisher':['Rockstar','Nintendo','Rockstar','Rockstar','Nintendo','Nintendo'],'Total Sales':['133000','430500','86000','129030','308900','112100']}
df = pd.DataFrame(data)
df['Total Sales'] = df['Total Sales'].astype(int)
df.groupby(['Year', 'Publisher'])['Total Sales'].agg('sum').reset_index()发布于 2022-01-03 15:25:25
这是一种方法:
df.drop(columns='Name', inplace=True)
df['Total Sales'] = pd.to_numeric(df['Total Sales'])
df2 = df.groupby(['Publisher','Year']).sum().reset_index()
df2https://stackoverflow.com/questions/70567588
复制相似问题