首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何根据另一列删除重复值?

如何根据另一列删除重复值?
EN

Stack Overflow用户
提问于 2022-06-30 14:28:28
回答 2查看 62关注 0票数 0

我有一个如下所示的数据集:

代码语言:javascript
复制
   Study_ID       Stage
1       100 Early Stage
2       100      Stable
3       200      Stable
4       300 Early Stage
5       400 Early Stage
6       400      Stable
7       500 Early Stage
8       500      Stable
9       600      Stable
10      700 Early Stage

我想删除任何重复的研究is,但保留条目的病人是“稳定的”。换句话说,我想删除每一个重复的学习ID,病人是‘早期’。

我想要的输出应该如下所示:

代码语言:javascript
复制
  Study_ID       Stage
1      100      Stable
2      200      Stable
3      300 Early Stage
4      400      Stable
5      500      Stable
6      600      Stable
7      700 Early Stage

我该怎么做呢?

可复制的数据:

代码语言:javascript
复制
data<-data.frame(Study_ID=c("100","100","200","300","400","400","500","500","600","700"),Stage=c("Early Stage","Stable","Stable","Early Stage","Early Stage","Stable","Early Stage","Stable","Stable","Early Stage"))
EN

回答 2

Stack Overflow用户

发布于 2022-06-30 14:36:42

代码语言:javascript
复制
library(dplyr)

data %>% 
  group_by(Study_ID) %>% 
  filter(!(n() > 1 & Stage != "Stable"))
#> # A tibble: 7 × 2
#> # Groups:   Study_ID [7]
#>   Study_ID Stage      
#>   <chr>    <chr>      
#> 1 100      Stable     
#> 2 200      Stable     
#> 3 300      Early Stage
#> 4 400      Stable     
#> 5 500      Stable     
#> 6 600      Stable     
#> 7 700      Early Stage

编辑1

为了确保您没有重复的行(正如@jay.sf所指出的,您可以执行以下操作(混乱)):

代码语言:javascript
复制
library(dplyr)

dat %>% 
  group_by(Study_ID) %>% 
  filter(!(n() > 1 & Stage != "Stable")) %>% 
  summarise(Stage = first(Stage))
#> # A tibble: 7 × 2
#>   Study_ID Stage      
#>      <int> <chr>      
#> 1      100 Stable     
#> 2      200 Stable     
#> 3      300 Early Stage
#> 4      400 Stable     
#> 5      500 Stable     
#> 6      600 Stable     
#> 7      700 Early Stage
票数 0
EN

Stack Overflow用户

发布于 2022-06-30 14:38:51

使用by。我在数据中添加了一个带有两个“稳定”的案例,作为可能的特例。

代码语言:javascript
复制
by(dat, dat$Study_ID, \(x) {
  if (any(grepl('Stable', x$Stage))) {
    unique(x[x$Stage == 'Stable', ])
  } else {
    unique(x)
  }
}) |> do.call(what=rbind)
#     Study_ID       Stage
# 100      100      Stable
# 200      200      Stable
# 300      300 Early Stage
# 400      400      Stable
# 500      500      Stable
# 600      600      Stable
# 700      700 Early Stage

或者使用舞台as.factorave !duplicated max

代码语言:javascript
复制
transform(dat, x=as.numeric(as.factor(Stage))) |> 
  subset(as.logical(ave(x, Study_ID, FUN=\(x) x == max(x) & !duplicated(x))) , -x)
#    Study_ID       Stage
# 2       100      Stable
# 3       200      Stable
# 4       300 Early Stage
# 6       400      Stable
# 8       500      Stable
# 9       600      Stable
# 11      700 Early Stage

注意,这是因为“早期”在字母表中“稳定”之前,否则使用factor并在参数中定义levels=顺序。

数据:

代码语言:javascript
复制
dat <- structure(list(Study_ID = c(100L, 100L, 200L, 300L, 400L, 400L, 
500L, 500L, 600L, 600L, 700L), Stage = c("Early Stage", "Stable", 
"Stable", "Early Stage", "Early Stage", "Stable", "Early Stage", 
"Stable", "Stable", "Stable", "Early Stage")), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11"))
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/72817572

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档