我有以下潘达斯的数据:
import pandas as pd
df = pd.DataFrame(
[
("bird", '2022-01',"Falconiformes"),
("bird", '2022-02',"Falconiformes"),
("bird", '2022-03',"Falconiformes"),
("bird", '2022-04',"Falconiformes"),
("bird", '2022-05',"Falconiformes"),
("bird", '2022-06',"Falconiformes"),
("bird", '2022-07',"Falconiformes"),
("bird", '2022-08',"Falconiformes"),
("bird", '2022-09',"Psittaciformes"),
("bird", '2022-10',"Psittaciformes"),
("bird", '2022-11',"Psittaciformes"),
("bird", '2022-12',"Psittaciformes"),
("mammal", '2022-01',"Falconiformes"),
("mammal", '2022-02',"Falconiformes"),
("mammal",'2022-03',"Falconiformes"),
("mammal", '2022-04',"Falconiformes"),
("mammal",'2022-05',"Falconiformes"),
("mammal", '2022-06',"Psittaciformes"),
("mammal", '2022-07',"Falconiformes"),
("mammal", '2022-08',"Falconiformes"),
("mammal", '2022-09',"Falconiformes"),
("mammal", '2022-10',"Falconiformes"),
("mammal", '2022-11',"Falconiformes"),
("mammal", '2022-12',"Falconiformes"),
],
columns=("animal", "date", "attribute"),
)现在事情变得越来越复杂了。对于每一种动物,我想要该组中最新的连续值序列的计数。
结果应该是
result = pd.DataFrame(
[ ("bird", 'Psittaciformes' ,4),
("mammal", 'Falconiformes' ,6),
],
columns=("animal", "attribute", "count"),
)
result我认为可以用迭代组或类似的方法来编程。我要找的是个独角兽。这应该是可能的,是吗?
发布于 2022-09-10 18:16:23
可以使用groupby.agg和自定义函数计算count
(df.groupby('animal', as_index=False)
.agg(attribute=('attribute', 'last'),
count=('attribute', lambda s: s.eq(s.iloc[-1])[::-1].cummin().sum())
)
)产出:
animal attribute count
0 bird Psittaciformes 4
1 mammal Falconiformes 6功能:
s.eq(s.iloc[-1]) # identify values equal to last one
[::-1] # inverse Series
.cummin() # set all values False after the first False
.sum() # count the True发布于 2022-09-10 19:32:17
另一种解决办法是:
df_out = df.groupby("animal", as_index=False).apply(
lambda x: x.groupby((x.attribute != x.attribute.shift()).cumsum())
.agg(
animal=("animal", "first"),
attribute=("attribute", "first"),
count=("animal", "count"),
)
.iloc[-1]
)
print(df_out)指纹:
animal attribute count
0 bird Psittaciformes 4
1 mammal Falconiformes 6https://stackoverflow.com/questions/73674096
复制相似问题