我正在尝试排列我的数据集,并在我的数据集中创建一个新列,该列根据两个单独的列来确定事件之间的顺序时间。
我有以下代码,应该可以帮助我做到这一点,但我遇到了困难的故障排除。以前有没有人遇到过这个问题,或者可以用我的代码找出这个问题?
我正在尝试使用的内容可以在下面找到:
样本数据集可以在下面找到:
UNITNUMBER <- c(1,1,1,1,2,2,3,3,3,4,4,4,4,4)
ORDERID <- c(5555,5558,5565,5278,5283,3287,3004,4678,2345,2189,1784,5743,4623,4541)
BREAKDOWN <- c(0,1,0,1,1,1,1,0,0,0,0,1,1,0)
RO_OPENED <- as.Date(c('2016-11-18','2016-11-28','2016-9-15','2017-4-2','2016-12-22','2017-3-8','2016-4-25','2016-2-3','2017-6-7','2016-7-5','2016-4-9','2017-10-27','2017-4-20','2017-5-10'))
test = data.frame(UNITNUMBER,ORDERID,BREAKDOWN,RO_OPENED)
test <- test %>% data.table(key = c("UNITNUMBER","RO_OPENED"))
test <- test[, c("UNITNUMBER", "RO_OPENED",
"TDIFF", "UNIQUEGROUP") :=
list(UNITNUMBER, RO_OPENED,
seq(.N), .GRP),
by = list(ORDERID)][, numSeq := seq(min(RO_OPENED), max(RO_OPENED)),
by = list(UNIQUEGROUP)][, runningTotal := ifelse(RO_OPENED == numSeq,
seq(.N), 1L),
by = list(UNITNUMBER, UNIQUEGROUP)]我收到的错误如下:
Error in seq.Date(min(RO_OPENED), max(RO_OPENED)) :
exactly two of 'to', 'by' and 'length.out' / 'along.with' must be specified我希望结果会是两个新的列,给我一个UNIQUEGROUP标识符和每个UNITNUMBER和ORDERID的故障之间的时间差,如下所示:
UNIT OrderID BD Date TDIFF
1 5565 0 9/15/2016 NA
1 5555 0 11/18/2016 NA
1 5558 1 11/28/2016 0
1 5278 1 4/2/2017 125
2 5283 1 12/22/2016 0
2 3287 1 3/8/2017 76
3 4678 0 2/3/2016 NA
3 3004 1 4/25/2016 0
3 2345 0 6/7/2017 NA
4 1784 0 4/9/2016 NA
4 2189 0 7/5/2016 NA
4 4623 1 4/20/2017 0
4 4541 0 5/10/2017 NA
4 5743 1 10/27/2017 190发布于 2019-10-28 05:08:03
这应该可以完成您的工作
library(dplyr)
test %>%
arrange(UNITNUMBER, RO_OPENED) %>%
group_by(UNITNUMBER, BREAKDOWN) %>%
mutate(TDIFF = coalesce(RO_OPENED - lag(RO_OPENED), 0),
TDIFF = ifelse(BREAKDOWN == 0, NA, TDIFF))发布于 2019-10-28 09:48:29
以下是data.table方法:
library(data.table)
setDT(test)
setorder(test, UNITNUMBER, RO_OPENED)
test[BREAKDOWN == 1,
TDIFF := c(0, diff(RO_OPENED)),
by = UNITNUMBER]
testhttps://stackoverflow.com/questions/58582784
复制相似问题