我有以下数据
Date Country Sales
01/02/20 France 1000
01/02/20 Germany 1000
02/02/20 France 2000
02/02/20 Germany 3000
03/02/20 France 4500
03/02/20 Germany 5000我想按国家计算每天的增长因子。例如,德国02/02/20 = 3000/1000和03/02/2020的增长率应为5000/3000。不是在找百分比的变化。
发布于 2020-04-28 16:14:44
df = pd.DataFrame([
['01/02/20', 'France ' ,1000],
['01/02/20', 'Germany' ,1000],
['02/02/20', 'France ' ,2000],
['02/02/20', 'Germany' ,3000],
['03/02/20', 'France ' ,4500],
['03/02/20', 'Germany' ,5000]
], columns=['Date', 'Country', 'Sales'])
df['Diff'] = df.groupby(['Country']).diff()
df['Growth'] = df.eval('Sales/(Sales - Diff)')
print(df)另一种选择是如上所述。
Date Country Sales Diff Growth
0 01/02/20 France 1000 NaN NaN
1 01/02/20 Germany 1000 NaN NaN
2 02/02/20 France 2000 1000.0 2.000000
3 02/02/20 Germany 3000 2000.0 3.000000
4 03/02/20 France 4500 2500.0 2.250000
5 03/02/20 Germany 5000 2000.0 1.666667发布于 2020-04-28 16:12:55
从按国家进行排序开始:df = df.sort_values(['Country'])
然后设置一个过滤器,它只在索引与前面的索引相同(即它们来自同一个国家)时返回true:filter = df['Country'] == df['Country'].shift(1)
现在,在上面的过滤器为真的最后一点上计算分数变化:
df['Growth'] = df['Sales'].where(filter)/df['Sales'].shift(1)
最后,按指数计算:
df = df.sort_index()
在以下方面的成果:
Date Country Sales Growth
0 01/02/20 France 1000 NaN
1 01/02/20 Germany 1000 NaN
2 02/02/20 France 2000 2.000000
3 02/02/20 Germany 3000 3.000000
4 03/02/20 France 4500 2.250000
5 03/02/20 Germany 5000 1.666667发布于 2020-04-28 17:18:16
使用:
df["Growth Factor"] = df.groupby(
"Country")["Sales"].transform(lambda g: g / g.shift())
print(df)这些指纹:
Date Country Sales Growth Factor
0 01/02/20 France 1000 NaN
1 01/02/20 Germany 1000 NaN
2 02/02/20 France 2000 2.000000
3 02/02/20 Germany 3000 3.000000
4 03/02/20 France 4500 2.250000
5 03/02/20 Germany 5000 1.666667https://stackoverflow.com/questions/61484241
复制相似问题