我期待着计算两个时间点之间每个人的心理健康分数的变化。
每个用户都有一个名字,以及3个不同时间点的心理健康评分。我想计算一下心理健康分数在3点到1点之间的变化。
下面是我从下面开始的df示例:
User Timepoint Mental Health Score
Bill 1 5
Bill 2 10
Bill 3 15
Wiz 1 10
Wiz 2 10
Wiz 3 15
Sam 1 5
Sam 2 5
Sam 3 5这是所需的输出:
User Timepoint Mental Health Score Change in Mental Health (TP1 and 3)
Bill 1 5
Bill 2 10
Bill 3 15 10
Wiz 1 10
Wiz 2 10
Wiz 3 15 5
Sam 1 5
Sam 2 5
Sam 3 5 0有人知道怎么做吗?
发布于 2022-06-09 13:35:29
您可以使用shift()和np.where()完成这一任务。
df['Change in Mental Health (TP1 and 3)'] = df['Mental Health Score'] - df['Mental Health Score'].shift(2)
df['Change in Mental Health (TP1 and 3)'] = np.where(df['Timepoint'] != 3, 0, df['Change in Mental Health (TP1 and 3)']).astype(int)
df发布于 2022-06-09 13:38:40
尝试使用groupby和where
#sort by Timepoint if needed
#df = df.sort_values("Timepoint")
changes = df.groupby("User")["Mental Health Score"].transform('last')-df.groupby("User")["Mental Health Score"].transform('first')
df["Change"] = changes.where(df["Timepoint"].eq(3))
>>> df
User Timepoint Mental Health Score Change
0 Bill 1 5 NaN
1 Bill 2 10 NaN
2 Bill 3 15 10.0
3 Wiz 1 10 NaN
4 Wiz 2 10 NaN
5 Wiz 3 15 5.0
6 Sam 1 5 NaN
7 Sam 2 5 NaN
8 Sam 3 5 0.0发布于 2022-06-09 13:40:21
正如注释中已经指出的,您可以在groupby上User上计算数据,并在Mental Health Score上计算差异。
我在这里放了一段代码来演示
def _overall_change(scores):
return scores.iloc[-1] - scores.iloc[0]
person = df.groupby('User')['Score'].agg(_overall_change)https://stackoverflow.com/questions/72561209
复制相似问题