我需要帮助熊猫群:
我有以下df:
A B C D
04547 2022-07-04 2022-07-04 1000000
04547 2022-07-11 2022-07-11 1000000
04547 2022-08-08 2022-08-08 1000000
04547 2022-10-11 2022-10-11 0100000
04547 2022-10-18 2022-10-18 0100000
04547 2022-10-24 2022-10-24 1000000
04547 2022-11-01 2022-11-01 0100000
04547 2022-11-08 2022-11-08 0100000
04548 2022-10-11 2022-10-11 0100000
04548 2022-10-18 2022-10-18 0100000
04548 2022-10-24 2022-10-24 1000000
04548 2022-11-01 2022-11-01 0100000
04548 2022-11-08 2022-11-08 0100000我需要的输出应该是:
A B C D
04547 2022-07-04 2022-08-08 1000000
04547 2022-10-11 2022-10-18 0100000
04547 2022-10-24 2022-10-24 1000000
04547 2022-11-01 2022-11-08 0100000
04548 2022-10-11 2022-10-18 0100000
04548 2022-10-24 2022-10-24 1000000
04548 2022-11-01 2022-11-08 0100000但有:
a = {'A':'first','B':'first','C':'last','D':'first'}
df = df.groupby(['A','D']).agg(a)
A B C D
4547 2022-10-11 2022-11-08 0100000
4547 2022-07-04 2022-10-24 1000000
4548 2022-10-11 2022-11-08 0100000
4548 2022-10-24 2022-10-24 1000000因为当D列中的新序列开始为每一列A分离时,我不得不中断分组
发布于 2022-06-30 08:47:13
您可以使用df['D'].ne(df['D'].shift().cumsum()作为第D列的石斑鱼,以确保按连续值进行分组(shift允许您访问上一行值):
a = {'A':'first','B':'first','C':'last','D':'first'}
out = (df.groupby(['A', df['D'].ne(df['D'].shift()).cumsum()],
as_index=False)
.agg(a)
)产出:
A B C D
0 04547 2022-07-04 2022-08-08 1000000
1 04547 2022-10-11 2022-10-18 0100000
2 04547 2022-10-24 2022-10-24 1000000
3 04547 2022-11-01 2022-11-08 0100000
4 04548 2022-10-11 2022-10-18 0100000
5 04548 2022-10-24 2022-10-24 1000000
6 04548 2022-11-01 2022-11-08 0100000发布于 2022-06-30 08:48:38
如果需要按A,D列的连续值分组,则使用DataFrame.shifted值comapre表示DataFrame.ne不相等,DataFrame.any表示Series,然后按Series.cumsum添加累积和
g = df[['A','D']].ne(df[['A','D']].shift()).any(1).cumsum()
a = {'A':'first','B':'first','C':'last','D':'first'}
df = df.groupby(g).agg(a)
print (df)
A B C D
1 04547 2022-07-04 2022-08-08 1000000
2 04547 2022-10-11 2022-10-18 0100000
3 04547 2022-10-24 2022-10-24 1000000
4 04547 2022-11-01 2022-11-08 0100000
5 04548 2022-10-11 2022-10-18 0100000
6 04548 2022-10-24 2022-10-24 1000000
7 04548 2022-11-01 2022-11-08 0100000https://stackoverflow.com/questions/72812800
复制相似问题