给定一个Pandas数据帧df
date numeric_field
0 2019-06-06 n
1 2019-06-07 n
2 2019-06-08 n我想在日期上执行自连接,并将函数应用于numerical_field。
在SQL中,我会这样做:
SELECT a.date,
FUN(b.numeric_field))) new_field
FROM df a
INNER JOIN df b
ON b.date < a.date
WHERE a.date > '2019-06-06'
AND a.date <= '2019-06-08'
AND b.date >= '2019-06-06'
GROUP BY a.date 结果应该是:
date new_field
0 2019-06-07 fun(n)
1 2019-06-08 fun(n)发布于 2020-05-05 23:37:18
我能想到的最快的方法是交叉合并和查询:
df = df.assign(dummy=1)
(df.merge(df, on='dummy', suffixes=['','_r'])
.query('"2019-06-06"<=date_r<date<="2019-06-08"') # filter
.assign(new_field=lambda x: x['numeric_field'].sum()) # replace the function in `lambda`
[['date','new_field']]
)输出:
date new_field
3 2019-06-07 nnn
6 2019-06-08 nnn
7 2019-06-08 nnnhttps://stackoverflow.com/questions/61616617
复制相似问题