我有以下数据,我只想保留数据集中只有6个实例的情况(相同的姓氏和名字)。例如,Quincy在df中出现了6次,我希望保留这些情况中的每一种,但是去掉Abrines,因为这个个体只有3个实例(< 6)。
last first start_year end_year Team GP MIN PTS W L
<chr> <chr> <int> <int> <chr> <int> <dbl> <dbl> <int> <int>
1 Abri… Alex 2016 2017 OKC 68 15.5 6 37 31
2 Abri… Alex 2017 2018 OKC 75 15.1 4.8 42 33
3 Abri… Alex 2018 2019 OKC 31 19 5.3 21 10
4 Acy Quin… 2013 2014 SAC 63 13.5 2.7 22 41
5 Acy Quin… 2014 2015 NYK 68 18.9 5.9 12 56
6 Acy Quin… 2015 2016 SAC 59 14.8 5.3 21 38
7 Acy Quin… 2016 2017 BKN 38 14.7 5.8 11 27
8 Acy Quin… 2017 2018 BKN 70 19.4 5.9 26 44
9 Acy Quin… 2018 2019 PHX 10 12.3 1.7 2 8我尝试过x <- df %>% count(last, first) %>% filter(n == 6),然后是df %>% filter(last %in% x$last & first %in% x$first),但是它分别匹配任何姓氏和任何名字,而不是匹配姓氏和名字。我相信还有一个更简单的解决方案,不需要首先使用group_by。
我希望这个解决方案看起来像:
<chr> <chr> <int> <int> <chr> <int> <dbl> <dbl> <int> <int>
1 Acy Quin… 2013 2014 SAC 63 13.5 2.7 22 41
2 Acy Quin… 2014 2015 NYK 68 18.9 5.9 12 56
3 Acy Quin… 2015 2016 SAC 59 14.8 5.3 21 38
4 Acy Quin… 2016 2017 BKN 38 14.7 5.8 11 27
5 Acy Quin… 2017 2018 BKN 70 19.4 5.9 26 44
6 Acy Quin… 2018 2019 PHX 10 12.3 1.7 2 8
7 Adams Stev… 2013 2014 OKC 81 14.8 3.3 59 22
8 Adams Stev… 2014 2015 OKC 70 25.3 7.7 37 33
9 Adams Stev… 2015 2016 OKC 80 25.2 8 54 26
10 Adams Stev… 2016 2017 OKC 80 29.9 11.3 47 33
11 Adams Stev… 2017 2018 OKC 76 32.7 13.9 43 33
12 Adams Stev… 2018 2019 OKC 80 33.4 13.8 47 33发布于 2020-02-08 00:25:41
不需要count summarise数据,创建一个新对象,然后执行filter,我们可以根据条件直接对组进行group_by、‘data’、'first‘。
library(dplyr)
df1 <- df %>%
group_by(last, first) %>%
filter(n() == 6)如果至少为6,则更改==或>=
或者另一个选择是table
subset(df, paste(last, first) %in% names(which(table(paste(last, first)) == 6)))发布于 2020-02-08 05:23:41
在基本R中,我们可以使用ave来计算每组first和last值中的行数,并选择行数为6的组。
subset(df, ave(start_year, first, last, FUN = length) == 6)
# last first start_year end_year Team GP MIN PTS W L
#4 Acy Quin… 2013 2014 SAC 63 13.5 2.7 22 41
#5 Acy Quin… 2014 2015 NYK 68 18.9 5.9 12 56
#6 Acy Quin… 2015 2016 SAC 59 14.8 5.3 21 38
#7 Acy Quin… 2016 2017 BKN 38 14.7 5.8 11 27
#8 Acy Quin… 2017 2018 BKN 70 19.4 5.9 26 44
#9 Acy Quin… 2018 2019 PHX 10 12.3 1.7 2 8我们也可以用data.table做同样的事情
library(data.table)
setDT(df)[,.SD[.N == 6], .(first, last)]数据
df <- structure(list(last = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 2L), .Label = c("Abri…", "Acy"), class = "factor"), first = structure(c(1L,
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Alex", "Quin…"
), class = "factor"), start_year = c(2016L, 2017L, 2018L, 2013L,
2014L, 2015L, 2016L, 2017L, 2018L), end_year = c(2017L, 2018L,
2019L, 2014L, 2015L, 2016L, 2017L, 2018L, 2019L), Team = structure(c(3L,
3L, 3L, 5L, 2L, 5L, 1L, 1L, 4L), .Label = c("BKN", "NYK", "OKC",
"PHX", "SAC"), class = "factor"), GP = c(68L, 75L, 31L, 63L,
68L, 59L, 38L, 70L, 10L), MIN = c(15.5, 15.1, 19, 13.5, 18.9,
14.8, 14.7, 19.4, 12.3), PTS = c(6, 4.8, 5.3, 2.7, 5.9, 5.3,
5.8, 5.9, 1.7), W = c(37L, 42L, 21L, 22L, 12L, 21L, 11L, 26L,
2L), L = c(31L, 33L, 10L, 41L, 56L, 38L, 27L, 44L, 8L)), class = "data.frame",
row.names = c(NA, -9L))https://stackoverflow.com/questions/60122818
复制相似问题