例如,假设有一个如下所示的数据集
编辑:为额外上下文添加日期和Num列
ID|Date |Col1|Col2|Col3|Num
1 10-10 Y 5
1 10-10 Y Y 5
1 10-10 Y 5
2 09-17 Y 6
2 09-17 Y 6
3 12-14 Y 7
3 12-14 Y 7
4 06-06 Y 8
4 06-06有什么方法可以得到这样的输出吗?
ID|Date |Col1|Col2|Col3|Num
1 10-10 Y Y Y 5
2 09-17 Y Y 6
3 12-14 Y Y 7
4 06-06 Y 8structure(list(ID = c("000001", "000001", "000001", "000001",
"000001", "000001", "000001", "000001", "000001", "000001", "000002",
"000002", "000002", "000003", "000003", "000003", "000003", "000003",
"000003", "000003"), Date = structure(c(1570492800, 1570492800,
1570492800, 1570492800, 1570492800, 1570492800, 1570492800, 1570492800,
1570492800, 1570492800, 1570665600, 1570665600, 1570665600, 1570838400,
1570838400, 1570838400, 1570838400, 1570838400, 1570838400, 1570838400
), tzone = "UTC", class = c("POSIXct", "POSIXt")), Col1 = c(NA,
NA, NA, NA, "Y", NA, NA, "Y", NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA), Col2 = c("Y", "Y", "Y", "Y", "Y", "Y", "Y",
"Y", "Y", "Y", "Y", "Y", "Y", NA, NA, NA, NA, "Y", "Y", "Y"),
Col3 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, "Y", NA, NA, "Y", "Y", "Y"), Num1 = c(1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 861, 861, 861, 497, 497, 497, 497, 497,
497, 497)), row.names = c(NA, -20L), class = c("tbl_df",
"tbl", "data.frame"))发布于 2022-09-02 16:00:35
我们可以按'ID‘、'Date’和summarise (其余的列(everything()) )分组,方法是循环使用across并提取第一个非NA元素。
library(dplyr)
df2 %>%
group_by(ID, Date) %>%
summarise(across(everything(), ~.x[!is.na(.x)][1]),
.groups = 'drop')-output
# A tibble: 3 × 6
ID Date Col1 Col2 Col3 Num1
<chr> <dttm> <chr> <chr> <chr> <dbl>
1 000001 2019-10-08 00:00:00 Y Y <NA> 1
2 000002 2019-10-10 00:00:00 <NA> Y <NA> 861
3 000003 2019-10-12 00:00:00 <NA> Y Y 497发布于 2022-09-02 18:37:48
带aggregate的R基选项
ID Date Col1 Col2 Col3 Num1
1 000001 2019-10-08 Y Y 1
2 000002 2019-10-10 Y 861
3 000003 2019-10-12 Y Y 497https://stackoverflow.com/questions/73585244
复制相似问题