我有一个像这样的数据文件:
id name industry income
1 apple telecommunication 100
2 oil gas 100
3 samsung telecommunication 200
4 coinbase crypto 100
5 microsoft telecommunication 30所以我想做的是找出每个行业的平均收入。它将是:电信110,gas 100,密码100。
我所做的就是找出每个行业的频率:
df.groupby(['industry']).sum().value_counts('industry')其结果是:
industry
telecommunication 3
gas 1
crypto 1我还发现了每个行业的收入之和:
df.groupby(['industry']).sum()['income']这会导致
industry
telecommunication 330
gas 100
crypto 100现在我有点纠结于如何继续..。
发布于 2021-12-21 22:05:31
你在找mean
means = df.groupby('industry')['income'].mean()输出:
>>> means
industry
crypto 100.0
gas 100.0
telecommunication 110.0
Name: income, dtype: float64
>>> means['telecommunication']
110.0发布于 2021-12-21 22:25:57
如果你想保留所有其他细节,群并转换
df['mean']=df.groupby('industry')['income'].transform('mean')
id name industry income mean
0 1 apple telecommunication 100 110.0
1 2 oil gas 100 100.0
2 3 samsung telecommunication 200 110.0
3 4 coinbase crypto 100 100.0
4 5 microsoft telecommunication 30 110.0如果你需要一个概括的框架
df.groupby('industry')['income'].mean().to_frame('mean_income')
mean_income
industry
crypto 100.0
gas 100.0
telecommunication 110.0发布于 2021-12-21 23:44:09
也许您应该使用agg来避免多个操作:
out = df.groupby('industry', sort=False).agg(size=('income', 'size'),
mean=('income', 'mean'),
sum=('income', 'sum')).reset_index()
print(out)
# Output:
industry size mean sum
0 telecommunication 3 110.0 330
1 gas 1 100.0 100
2 crypto 1 100.0 100https://stackoverflow.com/questions/70442065
复制相似问题