我有这个每周一次的DataFrame和df1
Startdate Enddate
0 2012-06-11 2012-06-17 23:59:59
1 2012-06-18 2012-06-24 23:59:59
2 2012-06-25 2012-07-01 23:59:59
3 2012-07-02 2012-07-08 23:59:59
4 2012-07-09 2012-07-15 23:59:59我想知道在我的其他数据格式的df2中,哪些日期在df1的开始结束日期之间。之后,我想向df2中添加一个新列,其中来自df1中匹配行的index位于其中。
我的df2看起来像:
FAID VALID_TO VALID_FROM
0 41279041 2100-06-10 02:00:00 2010-03-05 10:02:45
1 41279041 2100-06-10 02:00:00 2010-03-05 10:02:45
2 41279041 2100-06-10 02:00:00 2010-03-05 10:02:45
3 41279041 2100-06-10 02:00:00 2010-03-05 10:02:45
4 41279041 2100-06-10 02:00:00 2010-03-05 10:02:45因此,最好在短期内总结一下我的问题:如果df2['VALID_FROM']比df1[startdate]老,但也比df1[enddate]年轻,那么在新的专栏中写df1[index] (索引=星期数)
df2.loc[:,'NewColumn'] = df2.loc[:,'NewColumn'].mask(((df1.loc[:,'Startdate'] < df2.loc[:,'VALID_FROM']) & (df2.loc[:,'VALID_FROM'] < df1.loc[:,'Enddate'])),df1.loc[:,'index'])但随后就会发生这样的错误:
ValueError: Can only compare identically-labeled Series objects有什么办法把操作矢量化吗?
我的预期输出结果是:
FAID VALID_TO VALID_FROM Ind
0 41279041 2100-06-10 02:00:00 2010-03-05 10:02:45 2
1 41279041 2100-06-10 02:00:00 2010-03-05 10:02:45 7
2 41279041 2100-06-10 02:00:00 2010-03-05 10:02:45 3
3 41279041 2100-06-10 02:00:00 2010-03-05 10:02:45 14
4 41279041 2100-06-10 02:00:00 2010-03-05 10:02:45 25解决了
df2['Date'] = pd.to_datetime(df2['VALID_FROM']) - pd.to_timedelta(7, unit='d')
df3 = df2.groupby([pd.Grouper(key="Date", freq="W-MON")])["Date","FAID"].count()发布于 2020-08-27 14:04:32
df2['Date'] = pd.to_datetime(df2['VALID_FROM']) - pd.to_timedelta(7, unit='d')
df3 = df2.groupby([pd.Grouper(key="Date", freq="W-MON")])["Date","FAID"].count()这解决了我的问题
https://stackoverflow.com/questions/63616804
复制相似问题