我有DataFrame与客户的协议如下:
rng = pd.date_range('2020-12-01', periods=5, freq='D')
df = pd.DataFrame({ "ID" : ["1", "2", "1", "2", "2"],
"value" : [100, 200, 300, 400, 500],
"status" : ["active", "finished", "active", "finished", "active"],
"Date": rng})我需要创建基于以上df计算的新DataFrame:
为了更精确,我需要创建如下所示的df:

发布于 2020-12-16 11:33:36
将DataFrame.sort_values与具有聚合函数last的DataFrame.pivot_table按两列一起使用
df = (df.sort_values(['ID','Date'])
.pivot_table(index='ID', columns='status', values='value', aggfunc='last')
.rename(columns={'active':'New1','finished':'New2'})[['New1','New2']]
.reset_index()
.rename_axis(None,axis=1)
)
print (df)
ID New1 New2
0 1 300.0 NaN
1 2 500.0 400.0如果按组对日期时间进行排序,则解决方案更简单:
df = (df.pivot_table(index='ID', columns='status', values='value', aggfunc='last')
.rename(columns={'active':'New1','finished':'New2'})[['New1','New2']]
.reset_index()
.rename_axis(None,axis=1)
)
print (df)发布于 2020-12-16 11:39:15
试着用这个长的:
df1 = df.loc[df['status'] == "active"]
df2 = df.loc[df['status'] == "finished"]
df1 = df1.groupby("ID")['value'].last()
df2 = df2.groupby("ID")['value'].last()
IDs = df["ID"].drop_duplicates()
new_df = pd.DataFrame({"ID": IDs, "New1": df1.reindex(IDs).tolist(), "New2": df2.reindex(IDs).tolist()})
print(new_df)输出:
ID New1 New2
0 1 300 NaN
1 2 500 400.0https://stackoverflow.com/questions/65322384
复制相似问题