我有一个数据,在那里我有一个客户号码和他们访问的日期。
soTable <- data.frame(customer = c(1,1,1,1,1,2,2,2,3,3,4,4,4,4,5),
visit_date = c("12/4/2016","12/5/2016","12/6/2016","12/8/2016","12/22/2016",
"12/6/2016","12/9/2016","12/15/2016",
"12/4/2016","12/12/2016",
"12/4/2016","12/22/2016","12/23/2016","12/28/2016","12/5/2016"))首先,我需要给访问贴上标签,这是我可以用循环完成的,但是我想知道是否有一些更快的dplyr/data.table方法。结果如下所示:
customer visit_date visitNumber
1 1 12/4/2016 1
2 1 12/5/2016 2
3 1 12/6/2016 3
4 1 12/8/2016 4
5 1 12/22/2016 5
6 2 12/6/2016 1
7 2 12/9/2016 2
8 2 12/15/2016 3
9 3 12/4/2016 1
10 3 12/12/2016 2
11 4 12/4/2016 1
12 4 12/22/2016 2
13 4 12/23/2016 3
14 4 12/28/2016 4
15 5 12/5/2016 1然后,我需要找到访问次数之间的平均时间,如下所示
visitNumber averageTimeBetween
1 1 2
2 2 4
3 3 5
4 4 7
5 5 8发布于 2017-03-20 21:32:03
以下是如何计算两次访问之间的时间。首先,确保您的访问日期是正确的日期格式。
soTable <- transform(soTable , visit_date = as.Date(visit_date, format="%m/%d/%Y"))那么您可以使用dplyr
library(dplyr)
soTable %>% group_by(customer) %>% arrange(customer, visit_date) %>%
mutate(visit_number=seq_along(visit_date),
time_since=visit_date-lag(visit_date)) %>%
group_by(visit_number) %>%
summarize(mean=mean(time_since))对于示例数据,这将返回
# A tibble: 5 × 2
visit_number mean
<int> <time>
1 1 NA days
2 2 7.500000 days
3 3 2.666667 days
4 4 3.500000 days
5 5 14.000000 dayshttps://stackoverflow.com/questions/42914135
复制相似问题