首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >将数据帧group by索引行值与前一行的值进行比较

将数据帧group by索引行值与前一行的值进行比较
EN

Stack Overflow用户
提问于 2020-11-11 20:09:52
回答 1查看 82关注 0票数 0

只是想知道是否有一个简单的解决方案来解决下面的问题。采用以下设置

代码语言:javascript
复制
import datetime
import pandas

data = [
    {"date": datetime.date(2020, 1, 1), "ticker": "ticker-1", "internal_id": "T1", "score_1": 10.0, "score_2": 20.0},
    {"date": datetime.date(2020, 1, 5), "ticker": "ticker-1", "internal_id": "T1", "score_1": 20.0, "score_2": 20.0},
    {"date": datetime.date(2020, 1, 8), "ticker": "ticker-1", "internal_id": "T1", "score_1": 20.0, "score_2": 20.0},
    {"date": datetime.date(2020, 1, 10), "ticker": "ticker-1", "internal_id": "T1-A", "score_1": 10.0, "score_2": 30.0},

    {"date": datetime.date(2020, 1, 2), "ticker": "ticker-2", "internal_id": "T2", "score_1": 10.0, "score_2": 20.0},
    {"date": datetime.date(2020, 1, 4), "ticker": "ticker-2", "internal_id": "T2", "score_1": 10.0, "score_2": 20.0},
    {"date": datetime.date(2020, 1, 9), "ticker": "ticker-2", "internal_id": "T2", "score_1": 30.0, "score_2": 20.0},
]

df = pandas.DataFrame(data)
df = df.set_index(["date", "ticker"])
df['product'] = df.index.get_level_values('ticker')
df['date'] = df.index.get_level_values('date')

我需要能够比较某些列(internal_idscore_1score_2)的最后一个值,并将它们与该ticker上的前一列进行比较,如果它与前一列不同,则输出它,否则显示None/NaN

例如,在上面的示例之后,这是我想要的输出:

代码语言:javascript
复制
output = [
    {"date": datetime.date(2020, 1, 1), "ticker": "ticker-1", "internal_id": "T1", "score_1": 10.0, "score_2": 20.0},
    {"date": datetime.date(2020, 1, 5), "ticker": "ticker-1", "internal_id": None, "score_1": 20.0, "score_2": None},
    {"date": datetime.date(2020, 1, 8), "ticker": "ticker-1", "internal_id": None, "score_1": None, "score_2": None},
    {"date": datetime.date(2020, 1, 10), "ticker": "ticker-1", "internal_id": "T1-A", "score_1": 10.0, "score_2": 30.0},

    {"date": datetime.date(2020, 1, 2), "ticker": "ticker-2", "internal_id": "T2", "score_1": None, "score_2": 20.0},
    {"date": datetime.date(2020, 1, 4), "ticker": "ticker-2", "internal_id": None, "score_1": None, "score_2": None},
    {"date": datetime.date(2020, 1, 9), "ticker": "ticker-2", "internal_id": None, "score_1": 30.0, "score_2": None},
]

正如您所看到的,我需要按滚动条分组,然后与以前的日期列值进行比较。这需要跨字符串以及整型/浮点型工作。

EN

回答 1

Stack Overflow用户

发布于 2020-11-11 20:17:02

DataFrame.maskDataFrameGroupBy.shiftDataFrame.eq的比较值一起使用

代码语言:javascript
复制
df = pandas.DataFrame(data)

df = df.mask(df.groupby('ticker').shift().eq(df))
print (df)
         date    ticker internal_id  score_1  score_2
0  2020-01-01  ticker-1          T1     10.0     20.0
1  2020-01-05  ticker-1         NaN     20.0      NaN
2  2020-01-08  ticker-1         NaN      NaN      NaN
3  2020-01-10  ticker-1        T1-A     10.0     30.0
4  2020-01-02  ticker-2          T2     10.0     20.0
5  2020-01-04  ticker-2         NaN      NaN      NaN
6  2020-01-09  ticker-2         NaN     30.0      NaN
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/64786172

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档