我需要重塑时间序列表
例如)A => B
一个
no,A,B,B_sub
1,start,val_s,val_s_sub
2,study,val_st,val_st_sub
3,work,val_w,val_w_sub
4,end,val_e,val_e_sub
5,start,val_s1,val_s1_sub
6,end,val_e1,val_e1_sub
7,start,val_s2,val_s2_sub
8,work,val_w1,val_w1_sub
9,end,val_e2,val_e2_subB
,start,,study,,work,,end,
,B,B_sub,B,B_sub,B,B_sub,B,B_sub
4-1,val_s,val_s_sub,val_st,val_st_sub,val_w,val_w_sub,val_e,val_e_sub
6-5,val_s1,val_s1_sub,,,,,val_e1,val_e1_sub
9-7,val_s2,val_s2_sub,,,val_w1,val_w1_sub,val_e2,val_e2_sub我尝试使用python - pandas库的数据透视表函数,但是在我的表中没有通用的字符串作为索引
能给我点提示吗?
我迷路了。帮帮我吧..。
发布于 2021-01-18 22:30:26
这能让你离得够近吗?
df_a['grp'] = (df_a['A'] == 'start').cumsum()
df_a.set_index(['grp','A']).unstack('A')输出:
no B B_sub
A end start study work end start study work end start study work
grp
1 4.0 1.0 2.0 3.0 val_e val_s val_st val_w val_e_sub val_s_sub val_st_sub val_w_sub
2 6.0 5.0 NaN NaN val_e1 val_s1 NaN NaN val_e1_sub val_s1_sub NaN NaN
3 9.0 7.0 NaN 8.0 val_e2 val_s2 NaN val_w1 val_e2_sub val_s2_sub NaN val_w1_sub更进一步,重塑、重命名和塑造:
df_r = df_a.set_index(['grp','A']).unstack('A')
steps = df_r[('no', 'end')].astype(int).astype(str).str.cat(df_r[('no', 'start')].astype(int).astype(str), sep='-')
df_r.set_index(steps)[['B', 'B_sub']].swaplevel(0,1, axis=1).sort_index(level=0, axis=1)输出:
A end start study work
B B_sub B B_sub B B_sub B B_sub
(no, end)
4-1 val_e val_e_sub val_s val_s_sub val_st val_st_sub val_w val_w_sub
6-5 val_e1 val_e1_sub val_s1 val_s1_sub NaN NaN NaN NaN
9-7 val_e2 val_e2_sub val_s2 val_s2_sub NaN NaN val_w1 val_w1_subhttps://stackoverflow.com/questions/65775526
复制相似问题