文章/答案/技术大牛

发布

社区首页 >问答首页 >如何根据R中的最小日期差异为每个ID仅选取一行数据？

问如何根据R中的最小日期差异为每个ID仅选取一行数据？
EN

Stack Overflow用户

提问于 2020-03-16 19:11:30

回答 3查看 88关注 0票数 0

我有一个包含一些ID和每个ID的可变行数的数据帧，如下所示：

ID    Date_start    Date
1     2016-11-02    2020-2-22
1     2016-11-02    2015-1-18
2     2019-12-22    2017-3-2
2     2019-12-22    2019-2-9
2     2019-12-22    2017-12-1

而且，对于每个ID，我只想保留一行，日期最接近Date_start的那一行。然后，我想将abs(日期和Date_start)之间的日期差小于100天的所有行都设置为NA。

有没有简单的方法可以做到这一点？

非常感谢你提前

formatting

selection

回答 3

Stack Overflow用户

发布于 2020-03-16 19:16:27

使用dplyr的一种方法是对ID执行group_by操作，并使用Date获取差异最小的行。

library(dplyr)

df %>%
  mutate_at(-1, lubridate::ymd) %>%
  group_by(ID) %>%
  slice(which.min(abs(Date_start - Date)))

#    ID Date_start Date      
#  <int> <date>     <date>    
#1     1 2016-11-02 2015-01-18
#2     2 2019-12-22 2019-02-09

如果您想将日期设置为NA，您可以这样做。

df %>%
  mutate_at(-1, lubridate::ymd) %>%
  group_by(ID) %>%
  mutate(diff = as.numeric(abs(Date_start - Date))) %>%
  slice(which.min(abs(Date_start - Date))) %>%
  mutate(diff = replace(diff, diff < 100, NA))

data

df <- structure(list(ID = c(1L, 1L, 2L, 2L, 2L), Date_start = structure(c(1L, 
1L, 2L, 2L, 2L), .Label = c("2016-11-02", "2019-12-22"), class = "factor"), 
Date = structure(c(5L, 1L, 3L, 4L, 2L), .Label = c("2015-1-18", 
"2017-12-1", "2017-3-2", "2019-2-9", "2020-2-22"), class = "factor")),
class = "data.frame", row.names = c(NA, -5L))

票数 0

Stack Overflow用户

发布于 2020-03-16 19:19:02

使用dplyr，我们可以得到以下结果。我添加了一个新的ID，以显示如何将所有值设置为NA。使用罗纳克更好的想法replace而不是ifelse，我们可以保留类。

library(dplyr)

df %>%
  mutate(Date_diff = abs(difftime(Date, Date_start))) %>%
  group_by(ID) %>%
  filter(Date_diff == min(Date_diff)) %>%
  mutate_all(~replace(., Date_diff < 100, NA))
#> # A tibble: 3 x 4
#> # Groups:   ID [3]
#>      ID Date_start Date       Date_diff
#>   <dbl> <date>     <date>     <drtn>   
#> 1     1 2016-11-02 2015-01-18 654 days 
#> 2     2 2019-12-22 2019-02-09 316 days 
#> 3     3 NA         NA          NA days

Data

df <- structure(list(ID = c(1, 1, 2, 2, 2, 3), Date_start = structure(c(17107, 
17107, 18252, 18252, 18252, 18250), class = "Date"), Date = structure(c(18314, 
16453, 17227, 17936, 17501, 18202), class = "Date")), class = c("spec_tbl_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -6L))

票数 0

Stack Overflow用户

发布于 2020-03-16 19:31:54

Base R解决方案：

# Convert factors to dates: 

cleaned_df <- within(df, {
                    Date_start <- as.Date(sapply(Date_start, as.character), "%Y-%m-%d")
                    Date <- as.Date(sapply(Date, as.character), "%Y-%m-%d")
                    }
                  )
# Aggregate to find the min Date per id: 

data.frame(do.call("rbind", lapply(split(cleaned_df, cleaned_df$ID), 
       function(x){
         data.frame(ID = unique(x$ID), Date = x$Date[which.min(x$Date_start - x$Date)])
        }
       )
      ),
 row.names = NULL
)

Tidyverse解决方案：

library(tidyverse)
df %>% 
  mutate_if(str_detect(tolower(names(.)), "date"), funs(as.Date(., "%Y-%m-%d"))) %>%
  group_by(ID) %>% 
  summarise(Date = Date[which.min(Date - Date_start)]) %>% 
  ungroup()

数据感谢@Ronak Shah：

df <-
  structure(
    list(
      ID = c(1L, 1L, 2L, 2L, 2L),
      Date_start = structure(
        c(1L,
          1L, 2L, 2L, 2L),
        .Label = c("2016-11-02", "2019-12-22"),
        class = "factor"
      ),
      Date = structure(
        c(5L, 1L, 3L, 4L, 2L),
        .Label = c("2015-1-18",
                   "2017-12-1", "2017-3-2", "2019-2-9", "2020-2-22"),
        class = "factor"
      )
    ),
    class = "data.frame",
    row.names = c(NA,-5L)
  )

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/60704882

复制

相似问题

问如何根据R中的最小日期差异为每个ID仅选取一行数据？
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何根据R中的最小日期差异为每个ID仅选取一行数据？EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何根据R中的最小日期差异为每个ID仅选取一行数据？
EN