使用花旗自行车数据:https://s3.amazonaws.com/tripdata/index.html
tripduration starttime stoptime start_station_id start_station_name start_station_latitude start_station_longitude end_station_id end_station_name end_station_latitude end_station_longitude bikeid usertype birth_year gender
461 2016-02-01 00:00:08 2016-02-01 00:07:49 480 W 53 St & 10 Ave 40.766697 -73.990617 524 W 43 St & 6 Ave 40.755273 -73.983169 23292 Subscriber 1966.0 1
297 2016-02-01 00:00:56 2016-02-01 00:05:53 463 9 Ave & W 16 St 40.742065 -74.004432 380 W 4 St & 7 Ave S 40.734011 -74.002939 15329 Subscriber 1977.0 1
280 2016-02-01 00:01:00 2016-02-01 00:05:40 3134 3 Ave & E 62 St 40.763126 -73.965269 3141 1 Ave & E 68 St 40.765005 -73.958185 22927 Subscriber 1987.0 1使用Groupby函数按小时分组,我希望包含空值作为零。
我使用了以下代码:
bikes_parked = df.groupby(['end_station_name',pd.Grouper(key='stoptime',freq='H')]).size().reset_index()
bikes_parked.rename(columns={0: 'bikes_parked'},inplace=True)它返回按小时停放的自行车数,但对于没有跳过数据的小时。
输出:
end_station_name stoptime bikes_parked
0 1 Ave & E 15 St 2016-02-01 00:00:00 1
1 1 Ave & E 15 St 2016-02-01 05:00:00 1
2 1 Ave & E 15 St 2016-02-01 06:00:00 3我想包括停止时间01,02,03,04,bikes_parked也是0。
发布于 2016-05-10 15:26:49
正如评论中提到的,解决方案如下:
1)创建一个全小时范围的DataFrame,全部设置为bikes_parked=0
2)使用分组表中的相关数据更新此DF,方法是:
df.loc[bikes_parked.index, 'bikes_parked'] = bikes_parked.bikes_parkedhttps://stackoverflow.com/questions/36987317
复制相似问题