我的代码所处理的实际数据文件比这个要大得多,它需要能够处理不同的数据格式。下面的示例说明了具有相同内容的组的问题,以及如何只保留其中一个组。
考虑一下,我有不同内容的组。
Group Contents
GroupA Marble
GroupB Marble
GroupB Granite
GroupC Marble
GroupD Granite
GroupD Glass
GroupD Marble在上面的例子中,GroupA和GroupC都只包含大理石,所以我想删除其中一个组。我想要的输出:
Group Contents
GroupA Marble
GroupB Marble
GroupB Granite
GroupD Granite
GroupD Glass
GroupD Marble可复制的数据:
structure(list(Group = c("GroupA", "GroupB", "GroupB", "GroupC",
"GroupD", "GroupD", "GroupD"), Contents = c("Marble", "Marble",
"Granite", "Marble", "Granite", "Glass", "Marble")), class = "data.frame", row.names = c(NA,
-7L), spec = structure(list(cols = list(Group = structure(list(), class = c("collector_character",
"collector")), Contents = structure(list(), class = c("collector_character",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec")) 发布于 2020-10-16 12:44:49
你可以试试:
idx <- !duplicated(with(df, cbind(Contents, ave(Contents, Group, FUN = function(x) toString(sort(x))))))
df[idx, ]
Group Contents
1 GroupA Marble
2 GroupB Marble
3 GroupB Granite
5 GroupD Granite
6 GroupD Glass
7 GroupD Marble发布于 2020-10-16 12:27:50
下面是通过嵌套aggregate实现的一个选项
df[df$Group %in% aggregate(Group~.,aggregate(.~Group,df,toString),head,1)$Group,] Group Contents
1 GroupA Marble
2 GroupB Marble
3 GroupB Granite
5 GroupD Granite
6 GroupD Glass
7 GroupD Marble发布于 2020-10-16 19:06:25
从distinct到dplyr的一个选项
library(dplyr)
df %>%
arrange(across(everything())) %>%
group_by(Group) %>%
mutate(new = toString(Contents)) %>%
ungroup %>%
distinct(Contents, new, .keep_all = TRUE) %>%
select(-new)-output
# A tibble: 6 x 2
# Group Contents
# <chr> <chr>
#1 GroupA Marble
#2 GroupB Granite
#3 GroupB Marble
#4 GroupD Glass
#5 GroupD Granite
#6 GroupD Marble https://stackoverflow.com/questions/64389048
复制相似问题