我有下面提到的数据在R.
DF
ID Datetime Value
T-1 2020-01-01 15:12:14 10
T-2 2020-01-01 00:12:10 20
T-3 2020-01-01 03:11:11 25
T-4 2020-01-01 14:01:01 20
T-5 2020-01-01 18:07:11 10
T-6 2020-01-01 20:10:09 15
T-7 2020-01-01 15:45:23 15利用上述数据,在考虑Datetime的情况下,对计数基、月份和时间桶进行分叉处理.
所需产出:
Month Count Sum
Jan-20 7 115
12:00 AM to 05:00 AM 2 45
06:00 AM to 12:00 PM 0 0
12:00 PM to 03:00 PM 1 20
03:00 PM to 08:00 PM 3 35
08:00 PM to 12:00 AM 1 15发布于 2020-02-04 13:48:47
在使用dplyr进行总结之前,您可以使用来自hour包的lubridate和基R中的cut来存储一天中的时间。
在这里,我假设您的Datetime列实际上是一种日期时间格式,而不仅仅是一个字符串或因素。如果是的话,请确保您已经完成了DF$Datetime <- as.POSIXct(as.character(DF$Datetime))来转换它。
library(tidyverse)
DF$bins <- cut(lubridate::hour(DF$Datetime), c(-1, 5.99, 11.99, 14.99, 19.99, 24))
levels(DF$bins) <- c("00:00 to 05:59", "06:00 to 11:59", "12:00 to 14:59",
"15:00 to 19:59", "20:00 to 23:59")
newDF <- DF %>%
group_by(bins, .drop = FALSE) %>%
summarise(Count = length(Value), Total = sum(Value))这将产生以下结果:
newDF
#> # A tibble: 5 x 3
#> bins Count Total
#> <fct> <int> <dbl>
#> 1 00:00 to 05:59 2 45
#> 2 06:00 to 11:59 0 0
#> 3 12:00 to 14:59 1 20
#> 4 15:00 to 19:59 3 35
#> 5 20:00 to 23:59 1 15如果您想将一月作为第一行添加(但我不确定这在这个上下文中有多大意义),您可以这样做:
newDF %>%
summarise(bins = "January", Count = sum(Count), Total = sum(Total)) %>% bind_rows(newDF)
#> # A tibble: 6 x 3
#> bins Count Total
#> <chr> <int> <dbl>
#> 1 January 7 115
#> 2 00:00 to 05:59 2 45
#> 3 06:00 to 11:59 0 0
#> 4 12:00 to 14:59 1 20
#> 5 15:00 to 19:59 3 35
#> 6 20:00 to 23:59 1 15顺便说一句,我所使用的数据的可复制版本是:
structure(list(ID = structure(1:7, .Label = c("T-1", "T-2", "T-3",
"T-4", "T-5", "T-6", "T-7"), class = "factor"), Datetime = structure(c(1577891534,
1577837530, 1577848271, 1577887261, 1577902031, 1577909409, 1577893523
), class = c("POSIXct", "POSIXt"), tzone = ""), Value = c(10,
20, 25, 20, 10, 15, 15)), class = "data.frame", row.names = c(NA,
-7L))https://stackoverflow.com/questions/60056273
复制相似问题