我有以下两个数据。
df1:
code name region
0 AFG Afghanistan Middle East
1 NLD Netherlands Western Europe
2 AUT Austria Western Europe
3 IRQ Iraq Middle East
4 USA United States North America
5 CAD Canada North Americadf2:
code year gdp per capita
0 AFG 2010 547.35
1 NLD 2010 44851.27
2 AUT 2010 3577.10
3 IRQ 2010 4052.06
4 USA 2010 52760.00
5 CAD 2010 41155.32
6 AFG 2015 578.47
7 NLD 2015 45175.23
8 AUT 2015 3952.80
9 IRQ 2015 4688.32
10 USA 2015 56863.37
11 CAD 2015 43635.10我想返回2015年的code、year、gdp per capita和average (人均地区年国内生产总值)给那些地区国内生产总值高于平均水平的国家(应该是NLD,IRQ,美国)。结果应该是这样的:
code year gdp per capita average
3 NLD 2015 45175.23 24564.015
7 IRQ 2015 4688.32 2633.395
9 USA 2015 56863.37 50249.235我想在Python中尝试这一点,因为我最近完成了一个SQL入门课程,并对SQL解决方案的简单性感到惊讶。虽然我设法使它在Python中工作,但在我看来,它似乎过于复杂了。是否有任何方法可以用更少的代码或不需要.groupby和助手列来实现相同的结果?请看下面我的解决方案。
data = pd.merge(df1, df2, how="inner", on="code")
grouper = data.groupby(["region", "year"])["gdp per capita"].mean().reset_index()
for i in range(len(data)):
average = (grouper.loc[(grouper["year"] == data.loc[i, "year"]) & (grouper["region"] == data.loc[i, "region"]), "gdp per capita"]).to_list()[0]
data.loc[i, "average"] = average
result = data.loc[(data["year"] == 2015) & (data["gdp per capita"] > data["average"]), ["code", "year", "gdp per capita", "average"]]
print(result)发布于 2022-06-22 18:00:36
对于熊猫来说,循环根本就不是正确的答案。
# This is your join and where clause.
df = df1.merge(df2, on='code')[lambda x: x.year.eq(2015)]
# This is your aggregate function.
df['average'] = df.groupby(['region'])['gdp per capita'].transform('mean')
# This is your select and having clause.
out = df[df['gdp per capita'].gt(df['average'])][['code', 'year', 'gdp per capita', 'average']]
print(out)输出:
code year gdp per capita average
3 NLD 2015 45175.23 24564.015
7 IRQ 2015 4688.32 2633.395
9 USA 2015 56863.37 50249.235https://stackoverflow.com/questions/72719956
复制相似问题