我有一个巨大的数据帧,如下所示:
df = pandas.DataFrame({'date': ["2020-10-1 12:00:00", "2020-10-2 12:00:00", "2020-10-3 12:00:00", "2020-10-4 12:00:00",
"2020-10-5 12:00:00", "2020-10-6 12:00:00", "2020-10-7 12:00:00", "2020-10-8 12:00:00",
"2020-10-9 12:00:00"],
'revenue_A': [100, 250, 300, 300, 300, 300, 200, 100, 300],
'revenue_B': [100, 200, 200, 200, 200, 300, 250, 100, 200]})如果revenue_A和revenue_B至少在一定的连续时间内(例如48小时)没有变化,我想拆分数据帧。预期的结果将是:
date revenue_A revenue_B
0 2020-10-1 12:00:00 100 100
1 2020-10-2 12:00:00 250 200
2 2020-10-3 12:00:00 300 200
3 2020-10-4 12:00:00 300 200
4 2020-10-5 12:00:00 300 200和
date revenue_A revenue_B
5 2020-10-6 12:00:00 300 300
6 2020-10-7 12:00:00 200 250
7 2020-10-8 12:00:00 100 100
8 2020-10-9 12:00:00 300 200是否知道这是如何高效实现的(Dataframe有数百万行)。
发布于 2020-05-11 21:43:57
我不知道它是否足够有效,但这里有一种方法:
# compute row indices at which to split
splits = [i for i in range(2, len(df))
if (df.revenue_A[i-2] == df.revenue_A[i-1] == df.revenue_A[i]
and df.revenue_B[i-2] == df.revenue_B[i-1] == df.revenue_B[i])]
# sort in descending order
splits.sort(reverse=True)
# initialize list of subframes
subframes = []
# traverse rows backwards, so that we can add the subframe after the split point
# to the subframe list and drop it from the main frame without messing up the
# remaining indices
for split in splits:
if split < len(df) - 1:
subframes.append(df[split+1:])
df = df.drop(range(split+1, len(df)))
# also add the last remaining subframe to the list
subframes.append(df)https://stackoverflow.com/questions/61727159
复制相似问题