大家好,我需要帮助,只有当列高于阈值时,才能从df中删除重复的行。
下面是一个数据文件:
Group Species Values
1 G1 Cattus_cattus 10
2 G1 Cattus_cattus 10
3 G1 Cattus_cattus 10
4 G2 Canis_lupus 2
5 G2 Canis_lupus 2
6 G3 Griseus_lupa 90
7 G4 Griseus_lupa 89我想删除重复的c(Group,Species),当Values>5在这里时,我应该得到:
Group Species Values
1 G1 Cattus_cattus 10
4 G2 Canis_lupus 2
5 G2 Canis_lupus 2
6 G3 Griseus_lupa 90
7 G4 Griseus_lupa 89数据
structure(list(Group = structure(c(1L, 1L, 1L, 2L, 2L, 3L, 4L
), .Label = c("G1", "G2", "G3", "G4"), class = "factor"), Species = structure(c(2L,
2L, 2L, 1L, 1L, 3L, 3L), .Label = c("Canis_lupus", "Cattus_cattus",
"Griseus_lupa"), class = "factor"), Values = c(10L, 10L, 10L,
2L, 2L, 90L, 89L)), class = "data.frame", row.names = c(NA, -7L
))发布于 2021-03-16 18:01:13
使用dplyr
library(dplyr)
x %>%
filter(!duplicated(x)| Values <=5)发布于 2021-03-16 15:32:43
您可以使用duplicated并将其与x$Values < 5的或|测试相结合。
x[!duplicated(x) | x$Values <= 5,]
#x[!(duplicated(x) & x$Values > 5),] #Alternative
# Group Species Values
#1 G1 Cattus_cattus 10
#4 G2 Canis_lupus 2
#5 G2 Canis_lupus 2
#6 G3 Griseus_lupa 90
#7 G4 Griseus_lupa 89或者只适用于群体和物种
x[!(duplicated(x[c("Group","Species")]) & x$Values > 5),]发布于 2021-03-16 18:44:26
library(dplyr)
df %>%
group_by(Group, Species) %>%
slice(if(any(Values > 5)) 1 else 1:n())
# output:
# Groups: Group, Species [4]
Group Species Values
<fct> <fct> <int>
1 G1 Cattus_cattus 10
2 G2 Canis_lupus 2
3 G2 Canis_lupus 2
4 G3 Griseus_lupa 90
5 G4 Griseus_lupa 89https://stackoverflow.com/questions/66658319
复制相似问题