这是我的数据框架:
df = pd.DataFrame({'Period': ['1_Baseline', '1_Baseline', '1_Baseline', '2_Acute', '2_Acute', '2_Acute', '3_Chronic', '3_Chronic', '3_Chronic', '4_Discontinuation', '4_Discontinuation', '4_Discontinuation'],
'Subject': [1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3],
'Amount': [24, 52, 34, 95, 98, 54, 32, 20, 16, 52, 34, 95]})我想创建一个列,其中包含每个主题内每个期间相对于基线的金额变化百分比。因此,对于基线,它将显示主体1从基线到急性、从1_Baseline到3_Chronic以及从1_Baseline到4_Discontinuation的变化量。它对每个主题都会做同样的事情。
这是我尝试过的:
df['pct_change'] = df.groupby(['Period'])['Amount'].pct_change()但我得到了:
Period Subject Amount pct_change
0 1_Baseline 1 24 NaN
1 1_Baseline 2 52 1.166667
2 1_Baseline 3 34 -0.346154
3 2_Acute 1 95 1.794118
4 2_Acute 2 98 0.031579
5 2_Acute 3 54 -0.448980
6 3_Chronic 1 32 -0.407407
7 3_Chronic 2 20 -0.375000
8 3_Chronic 3 16 -0.200000
9 4_Discontinuation 1 52 2.250000
10 4_Discontinuation 2 34 -0.346154
11 4_Discontinuation 3 95 1.794118结果不是在每个周期内计算的,也不是相对于每个受试者之前的金额。
预期输出:
Period Subject Amount pct_change
0 1_Baseline 1 24 NaN
1 1_Baseline 2 52 NaN
2 1_Baseline 3 34 NaN
3 2_Acute 1 95 2.958333333
4 2_Acute 2 98 0.884615385
5 2_Acute 3 54 0.588235294
6 3_Chronic 1 32 0.333333333
7 3_Chronic 2 20 -0.615384615
8 3_Chronic 3 16 -0.529411765
9 4_Discontinuation 1 52 1.166666667
10 4_Discontinuation 2 34 -0.346153846
11 4_Discontinuation 3 95 1.794117647发布于 2020-04-03 06:23:33
IIUC,你想在每一行用Subject==2除以Amount,在Period==1_Baseline和Subject==2除以Amount。以下是我的方法:
s = df.set_index(['Subject', 'Period']).Amount.unstack('Period')
df['pct_change'] = (s.div(s['1_Baseline'], axis='rows').sub(1)
.unstack().values
)输出:
Period Subject Amount pct_change
0 1_Baseline 1 24 0.000000
1 1_Baseline 2 52 0.000000
2 1_Baseline 3 34 0.000000
3 2_Acute 1 95 2.958333
4 2_Acute 2 98 0.884615
5 2_Acute 3 54 0.588235
6 3_Chronic 1 32 0.333333
7 3_Chronic 2 20 -0.615385
8 3_Chronic 3 16 -0.529412
9 4_Discontinuation 1 52 1.166667
10 4_Discontinuation 2 34 -0.346154
11 4_Discontinuation 3 95 1.794118请注意,行的顺序非常重要。在这种情况下,您确实有正确的行顺序来执行此操作。如果您不确定顺序,那么合并会更安全:
s = df.set_index(['Subject', 'Period']).Amount.unstack('Period')
s = s.div(s['1_Baseline'], axis='rows').sub(1).unstack().reset_index(name='pct_change')
df.merge(s, on=['Period','Subject'], how='left')https://stackoverflow.com/questions/61002004
复制相似问题