只是想知道是否有一个简单的解决方案来解决下面的问题。采用以下设置
import datetime
import pandas
data = [
{"date": datetime.date(2020, 1, 1), "ticker": "ticker-1", "internal_id": "T1", "score_1": 10.0, "score_2": 20.0},
{"date": datetime.date(2020, 1, 5), "ticker": "ticker-1", "internal_id": "T1", "score_1": 20.0, "score_2": 20.0},
{"date": datetime.date(2020, 1, 8), "ticker": "ticker-1", "internal_id": "T1", "score_1": 20.0, "score_2": 20.0},
{"date": datetime.date(2020, 1, 10), "ticker": "ticker-1", "internal_id": "T1-A", "score_1": 10.0, "score_2": 30.0},
{"date": datetime.date(2020, 1, 2), "ticker": "ticker-2", "internal_id": "T2", "score_1": 10.0, "score_2": 20.0},
{"date": datetime.date(2020, 1, 4), "ticker": "ticker-2", "internal_id": "T2", "score_1": 10.0, "score_2": 20.0},
{"date": datetime.date(2020, 1, 9), "ticker": "ticker-2", "internal_id": "T2", "score_1": 30.0, "score_2": 20.0},
]
df = pandas.DataFrame(data)
df = df.set_index(["date", "ticker"])
df['product'] = df.index.get_level_values('ticker')
df['date'] = df.index.get_level_values('date')我需要能够比较某些列(internal_id,score_1,score_2)的最后一个值,并将它们与该ticker上的前一列进行比较,如果它与前一列不同,则输出它,否则显示None/NaN。
例如,在上面的示例之后,这是我想要的输出:
output = [
{"date": datetime.date(2020, 1, 1), "ticker": "ticker-1", "internal_id": "T1", "score_1": 10.0, "score_2": 20.0},
{"date": datetime.date(2020, 1, 5), "ticker": "ticker-1", "internal_id": None, "score_1": 20.0, "score_2": None},
{"date": datetime.date(2020, 1, 8), "ticker": "ticker-1", "internal_id": None, "score_1": None, "score_2": None},
{"date": datetime.date(2020, 1, 10), "ticker": "ticker-1", "internal_id": "T1-A", "score_1": 10.0, "score_2": 30.0},
{"date": datetime.date(2020, 1, 2), "ticker": "ticker-2", "internal_id": "T2", "score_1": None, "score_2": 20.0},
{"date": datetime.date(2020, 1, 4), "ticker": "ticker-2", "internal_id": None, "score_1": None, "score_2": None},
{"date": datetime.date(2020, 1, 9), "ticker": "ticker-2", "internal_id": None, "score_1": 30.0, "score_2": None},
]正如您所看到的,我需要按滚动条分组,然后与以前的日期列值进行比较。这需要跨字符串以及整型/浮点型工作。
发布于 2020-11-11 20:17:02
将DataFrame.mask与DataFrameGroupBy.shift和DataFrame.eq的比较值一起使用
df = pandas.DataFrame(data)
df = df.mask(df.groupby('ticker').shift().eq(df))
print (df)
date ticker internal_id score_1 score_2
0 2020-01-01 ticker-1 T1 10.0 20.0
1 2020-01-05 ticker-1 NaN 20.0 NaN
2 2020-01-08 ticker-1 NaN NaN NaN
3 2020-01-10 ticker-1 T1-A 10.0 30.0
4 2020-01-02 ticker-2 T2 10.0 20.0
5 2020-01-04 ticker-2 NaN NaN NaN
6 2020-01-09 ticker-2 NaN 30.0 NaNhttps://stackoverflow.com/questions/64786172
复制相似问题