问根据日期和初始事件标识重复ID的后续事件
EN

Stack Overflow用户

提问于 2019-04-02 19:57:08

回答 1查看 80关注 0票数 0

我试图根据日期和初始事件确定重复ID。下面是一个示例数据集

+----+------------+-------------------------+
| ID |    Date    | Investigation or Intake |
+----+------------+-------------------------+
|  1 | 1/1/2019   | Investigation           |
|  2 | 1/2/2019   | Investigation           |
|  3 | 1/3/2019   | Investigation           |
|  4 | 1/4/2019   | Investigation           |
|  1 | 1/2/2019   | Intake                  |
|  2 | 12/31/2018 | Intake                  |
| 3  | 1/5/2019   | Intake                  |
+----+------------+-------------------------+

我想要编写R代码来检查从1到4的ID(有调查的ID)，并查看它们是否有后续的摄入量(在比调查日期更晚的时间内发生的摄入量)。因此，预期的输出如下：

+----+------------+-------------------------+------------+
| ID |    Date    | Investigation or Intake | New Column |
+----+------------+-------------------------+------------+
|  1 | 1/1/2019   | Investigation           | Sub Intake |
|  2 | 1/2/2019   | Investigation           | None       |
|  3 | 1/3/2019   | Investigation           | Sub Intake |
|  4 | 1/4/2019   | Investigation           | None       |
|  1 | 1/2/2019   | Intake                  |            |
|  2 | 12/31/2018 | Intake                  |            |
| 3  | 1/5/2019   | Intake                  |            |
+----+------------+-------------------------+------------+

解决这个问题的代码是什么样子的？我猜这将是某种循环功能？

谢谢!

duplicates

回答 1

Stack Overflow用户

回答已采纳

发布于 2019-04-03 02:48:21

您可以使用dplyr包并使用一些ifelse语句根据需要创建一个新列。与其使用循环，不如使用lead函数检查组中的下一个条目。这个解决方案假设，在每个组中，您将有一个“调查”，然后是0或更多的“摄入量”条目，然后列出。

library(dplyr)
df <- data.frame(ID = c(1, 2, 3, 4, 1, 2, 3),
                   Date = as.Date(c("2019-01-01", "2019-01-02", "2019-1-03", "2019-01-04", "2019-01-02", "2018-12-31", "2019-1-5")),
                   Investigation_or_Intake = c("Investigation", "Investigation", "Investigation", "Investigation", "Intake", "Intake", "Intake"),
                   stringsAsFactors = FALSE)
 df %>% 
   group_by(ID) %>% # Make groups according to ID column
   mutate(newcol = ifelse(lead(Date) > Date, "Sub Intake", "None"), # Check next entry in the group to see if Date is after current
          newcol = ifelse(Investigation_or_Intake == "Investigation" & is.na(newcol), "None", newcol)) # Change "Investigation" entries with no Intake to "None"

这给了我们

ID Date       Investigation_or_Intake newcol    
  <dbl> <date>     <chr>                   <chr>     
1     1 2019-01-01 Investigation           Sub Intake
2     2 2019-01-02 Investigation           None      
3     3 2019-01-03 Investigation           Sub Intake
4     4 2019-01-04 Investigation           None      
5     1 2019-01-02 Intake                  NA        
6     2 2018-12-31 Intake                  NA        
7     3 2019-01-05 Intake                  NA

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/55482609

复制

相似问题

问根据日期和初始事件标识重复ID的后续事件
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问根据日期和初始事件标识重复ID的后续事件EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问根据日期和初始事件标识重复ID的后续事件
EN