我有一个很大的df购买,按人和日期时间排序。我希望将df缩短到购买水果的行,并在购买水果之前为每个人最多缩短5行。如果这5行中的一行是购买水果,那么我只需要提取上面的行,直到最后一次购买水果。
fruits = ["apple", "banana", "pear"]
date pesron item
10:00 amy apple
10:01 amy pear
10:02 amy ipad (6 rows above 10:08 banana purchase, so delete)
10:03 amy headphones
10:04 amy missing
10:05 amy laptop
10:06 amy unknown
10:07 amy table
10:08 amy banana (one of the 5 rows above 10:10 banana purchase is also a fruit so stop here)
10:09 amy unknown
10:10 amy banana
... ben 预期产出
10:00 amy apple
10:01 amy pear
10:03 amy headphones
10:04 amy missing
10:05 amy laptop
10:06 amy unknown
10:07 amy table
10:08 amy banana
10:09 amy unknown
10:10 amy banana我试过了
m1= df.item.isin(fruits)
df.loc[m1, df.iloc[idx-5,:]]发布于 2022-08-08 08:22:46
您可以通过交换顺序创建帮助组,方法是用累积和索引行,然后对上面5行原始行使用GroupBy.tail。
print (df)
date pesron item
0 10:00 amy apple
1 10:01 amy pear
2 10:02 amy ipad
3 10:03 amy headphones
4 10:04 amy missing
5 10:05 amy laptop
6 10:06 amy unknown
7 10:07 amy table
8 10:08 amy banana
9 10:09 amy unknown
10 10:10 amy ban <- change row
11 10:09 ben unknown <- added rowfruits = ["apple", "banana", "pear"]
#swap rows and test for mmbership of list
m = df.iloc[::-1].item.isin(fruits)
#group by column and helper Series by cumulative sum of mask
g = m.groupby([df['pesron'], m.cumsum()])
#remove all rows after last match
mask1 = g.cummax()
#counter of rows per groups
mask2 = g.cumcount().lt(7).iloc[::-1]
#chain masks
df = df[mask1 & mask2]
print (df)
date pesron item
0 10:00 amy apple
1 10:01 amy pear
2 10:02 amy ipad
3 10:03 amy headphones
4 10:04 amy missing
5 10:05 amy laptop
6 10:06 amy unknown
7 10:07 amy table
8 10:08 amy bananahttps://stackoverflow.com/questions/73274780
复制相似问题