首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >熊猫:使用groupby w.r.t日期添加/追加行

熊猫:使用groupby w.r.t日期添加/追加行
EN

Stack Overflow用户
提问于 2019-11-25 07:35:29
回答 1查看 61关注 0票数 2

我有一个下面的DataFrame,它看起来像这样

代码语言:javascript
复制
+------+------+-------------------------------+-------------------------------+-----+----------+-----------------+-------------------------------+-------------------------------+
|  Id  | B_Id |             Start             |              End              | Day | Week_day |        f        |           LoginTime           |          LogoutTime           |
+------+------+-------------------------------+-------------------------------+-----+----------+-----------------+-------------------------------+-------------------------------+
| 4960 | 2213 | 2019-11-06 16:21:25.722382200 | 2019-11-06 16:50:32.219562100 |   6 |        2 | 00:44:07.348063 | 2019-11-06 15:37:18.374318400 | 2019-11-06 19:11:52.483390400 |
| 4967 | NaN  | 2019-11-06 17:05:45.411147300 | 2019-11-06 18:09:32.823322100 |   6 |        2 | 00:15:13.191585 | 2019-11-06 15:37:18.374318400 | 2019-11-06 19:11:52.483390400 |
| 4973 | 2234 | 2019-11-06 18:58:11.333021900 | 2019-11-06 18:58:44.839433600 |   6 |        2 | 00:48:38.509699 | 2019-11-06 15:37:18.374318400 | 2019-11-06 19:11:52.483390400 |
| 5131 | NaN  | 2019-11-11 17:50:06.068008900 | 2019-11-11 18:22:28.992619600 |  11 |        0 | 00:03:45.366140 | 2019-11-11 17:46:20.701868300 | 2019-11-11 23:09:55.832635600 |
| 5135 | NaN  | 2019-11-11 18:55:06.819281700 | 2019-11-11 19:19:47.855423700 |  11 |        0 | 00:32:37.826662 | 2019-11-11 17:46:20.701868300 | 2019-11-11 23:09:55.832635600 |
| 5138 | NaN  | 2019-11-11 19:45:12.931469700 | 2019-11-11 20:00:22.358470600 |  11 |        0 | 00:25:25.076046 | 2019-11-11 17:46:20.701868300 | 2019-11-11 23:09:55.832635600 |
| 5139 | NaN  | 2019-11-11 20:13:29.918047600 | 2019-11-11 20:53:23.214166300 |  11 |        0 | 00:13:07.559577 | 2019-11-11 17:46:20.701868300 | 2019-11-11 23:09:55.832635600 |
| 5142 | NaN  | 2019-11-11 21:41:48.524924200 | 2019-11-11 21:49:10.308323100 |  11 |        0 | 00:48:25.310757 | 2019-11-11 17:46:20.701868300 | 2019-11-11 23:09:55.832635600 |
| 5144 | NaN  | 2019-11-11 22:00:51.106021500 | 2019-11-11 22:19:48.857907400 |  11 |        0 | 00:11:40.797698 | 2019-11-11 17:46:20.701868300 | 2019-11-11 23:09:55.832635600 |
| 5145 | NaN  | 2019-11-11 22:30:14.077789100 | 2019-11-11 22:56:42.090168300 |  11 |        0 | 00:10:25.219881 | 2019-11-11 17:46:20.701868300 | 2019-11-11 23:09:55.832635600 |
| 5225 | NaN  | 2019-11-14 14:14:08.879589200 | 2019-11-14 14:52:40.396641500 |  14 |        3 | 01:06:08.250563 | 2019-11-14 13:08:00.629026000 | 2019-11-14 23:43:31.860982200 |
+------+------+-------------------------------+-------------------------------+-----+----------+-----------------+-------------------------------+-------------------------------+

我希望将包含logOut时间和最后结束日期(例如:2019-11-06 19:11:52.483390400 - 2019-11-06 18:58:44.839433600 )的差异的行附加到每天的末尾

因此,最终输出如下所示:

代码语言:javascript
复制
+------+-----------+-------------------------------+-------------------------------+-----+----------+-----------------+-------------------------------+-------------------------------+
|  Id  | B_Id      |             Start             |              End              | Day | Week_day |        f        |           LoginTime           |          LogoutTime           |
+------+-----------+-------------------------------+-------------------------------+-----+----------+-----------------+-------------------------------+-------------------------------+
| 4960 | 2213      | 2019-11-06 16:21:25.722382200 | 2019-11-06 16:50:32.219562100 |   6 |        2 | 00:44:07.348063 | 2019-11-06 15:37:18.374318400 | 2019-11-06 19:11:52.483390400 |
| 4967 | NaN       | 2019-11-06 17:05:45.411147300 | 2019-11-06 18:09:32.823322100 |   6 |        2 | 00:15:13.191585 | 2019-11-06 15:37:18.374318400 | 2019-11-06 19:11:52.483390400 |
| 4973 | 2234      | 2019-11-06 18:58:11.333021900 | 2019-11-06 18:58:44.839433600 |   6 |        2 | 00:48:38.509699 | 2019-11-06 15:37:18.374318400 | 2019-11-06 19:11:52.483390400 |
| nan  | NaN       | nat                           | nat                           |   6 |        2 | 00:13:08.509699 | 2019-11-06 15:37:18.374318400 | 2019-11-06 19:11:52.483390400 |
| 5131 | NaN       | 2019-11-11 17:50:06.068008900 | 2019-11-11 18:22:28.992619600 |  11 |        0 | 00:03:45.366140 | 2019-11-11 17:46:20.701868300 | 2019-11-11 23:09:55.832635600 |
| 5135 | NaN       | 2019-11-11 18:55:06.819281700 | 2019-11-11 19:19:47.855423700 |  11 |        0 | 00:32:37.826662 | 2019-11-11 17:46:20.701868300 | 2019-11-11 23:09:55.832635600 |
| 5138 | NaN       | 2019-11-11 19:45:12.931469700 | 2019-11-11 20:00:22.358470600 |  11 |        0 | 00:25:25.076046 | 2019-11-11 17:46:20.701868300 | 2019-11-11 23:09:55.832635600 |
| 5139 | NaN       | 2019-11-11 20:13:29.918047600 | 2019-11-11 20:53:23.214166300 |  11 |        0 | 00:13:07.559577 | 2019-11-11 17:46:20.701868300 | 2019-11-11 23:09:55.832635600 |
| 5142 | NaN       | 2019-11-11 21:41:48.524924200 | 2019-11-11 21:49:10.308323100 |  11 |        0 | 00:48:25.310757 | 2019-11-11 17:46:20.701868300 | 2019-11-11 23:09:55.832635600 |
| 5144 | NaN       | 2019-11-11 22:00:51.106021500 | 2019-11-11 22:19:48.857907400 |  11 |        0 | 00:11:40.797698 | 2019-11-11 17:46:20.701868300 | 2019-11-11 23:09:55.832635600 |
| 5145 | NaN       | 2019-11-11 22:30:14.077789100 | 2019-11-11 22:56:42.090168300 |  11 |        0 | 00:10:25.219881 | 2019-11-11 17:46:20.701868300 | 2019-11-11 23:09:55.832635600 |
| nan  | NaN       | nat                           | nat                           |  11 |        0 | 00:13:13.219881 | 2019-11-11 17:46:20.701868300 | 2019-11-11 23:09:55.832635600 |
+------+-----------+-------------------------------+-------------------------------+-----+----------+-----------------+-------------------------------+-------------------------------+

我只对f列rest感兴趣,可以有NaN和nulls等,也可以重复使用。如有任何指导,将不胜感激。

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2019-11-25 07:53:23

使用GroupBy.last按某些列的日期,例如LoginTime,然后将列f按减法设置,将另一列设置为缺失值,将DataFrame.append设置为原始列,并按一些列(如LoginTimeStart )进行排序。

代码语言:javascript
复制
#if necessary convert to timedeltas
df['f'] = pd.to_timedelta(df['f'])
print (df.dtypes)
Id                      int64
B_Id                  float64
Start          datetime64[ns]
End            datetime64[ns]
Day                     int64
Week_day                int64
f             timedelta64[ns]
LoginTime      datetime64[ns]
LogoutTime     datetime64[ns]
dtype: object


df1 = df.groupby(df['LoginTime'].dt.date).last()
df1 = (df1.assign(f = df1['LogoutTime'].sub(df1['End']),
                  Id = np.nan,
                  B_Id = np.nan, 
                  Start = np.nan,
                  End = np.nan)
          .append(df)
          .sort_values(['LoginTime', 'Start'])
          .reset_index(drop=True))

代码语言:javascript
复制
print (df1)
        Id    B_Id                         Start  \
0   4960.0  2213.0 2019-11-06 16:21:25.722382200   
1   4967.0     NaN 2019-11-06 17:05:45.411147300   
2   4973.0  2234.0 2019-11-06 18:58:11.333021900   
3      NaN     NaN                           NaT   
4   5131.0     NaN 2019-11-11 17:50:06.068008900   
5   5135.0     NaN 2019-11-11 18:55:06.819281700   
6   5138.0     NaN 2019-11-11 19:45:12.931469700   
7   5139.0     NaN 2019-11-11 20:13:29.918047600   
8   5142.0     NaN 2019-11-11 21:41:48.524924200   
9   5144.0     NaN 2019-11-11 22:00:51.106021500   
10  5145.0     NaN 2019-11-11 22:30:14.077789100   
11     NaN     NaN                           NaT   
12  5225.0     NaN 2019-11-14 14:14:08.879589200   
13     NaN     NaN                           NaT   

                             End  Day  Week_day               f  \
0  2019-11-06 16:50:32.219562100    6         2 00:44:07.348063   
1  2019-11-06 18:09:32.823322100    6         2 00:15:13.191585   
2  2019-11-06 18:58:44.839433600    6         2 00:48:38.509699   
3                            NaT    6         2 00:13:07.643956   
4  2019-11-11 18:22:28.992619600   11         0 00:03:45.366140   
5  2019-11-11 19:19:47.855423700   11         0 00:32:37.826662   
6  2019-11-11 20:00:22.358470600   11         0 00:25:25.076046   
7  2019-11-11 20:53:23.214166300   11         0 00:13:07.559577   
8  2019-11-11 21:49:10.308323100   11         0 00:48:25.310757   
9  2019-11-11 22:19:48.857907400   11         0 00:11:40.797698   
10 2019-11-11 22:56:42.090168300   11         0 00:10:25.219881   
11                           NaT   11         0 00:13:13.742467   
12 2019-11-14 14:52:40.396641500   14         3 01:06:08.250563   
13                           NaT   14         3 08:50:51.464340   

                       LoginTime                    LogoutTime  
0  2019-11-06 15:37:18.374318400 2019-11-06 19:11:52.483390400  
1  2019-11-06 15:37:18.374318400 2019-11-06 19:11:52.483390400  
2  2019-11-06 15:37:18.374318400 2019-11-06 19:11:52.483390400  
3  2019-11-06 15:37:18.374318400 2019-11-06 19:11:52.483390400  
4  2019-11-11 17:46:20.701868300 2019-11-11 23:09:55.832635600  
5  2019-11-11 17:46:20.701868300 2019-11-11 23:09:55.832635600  
6  2019-11-11 17:46:20.701868300 2019-11-11 23:09:55.832635600  
7  2019-11-11 17:46:20.701868300 2019-11-11 23:09:55.832635600  
8  2019-11-11 17:46:20.701868300 2019-11-11 23:09:55.832635600  
9  2019-11-11 17:46:20.701868300 2019-11-11 23:09:55.832635600  
10 2019-11-11 17:46:20.701868300 2019-11-11 23:09:55.832635600  
11 2019-11-11 17:46:20.701868300 2019-11-11 23:09:55.832635600  
12 2019-11-14 13:08:00.629026000 2019-11-14 23:43:31.860982200  
13 2019-11-14 13:08:00.629026000 2019-11-14 23:43:31.860982200  
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/59027097

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档