首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >从dataframe中的特定行中移除括号内的文本

从dataframe中的特定行中移除括号内的文本
EN

Stack Overflow用户
提问于 2021-09-24 11:16:45
回答 3查看 39关注 0票数 1

这是我的样本数据

代码语言:javascript
复制
dput(aa)
structure(list(V4 = structure(1:22, .Label = c("Peak228404", 
"Peak228411", "Peak228413", "Peak228423", "Peak228424", "Peak228439", 
"Peak228461", "Peak228476", "Peak228479", "Peak228495", "Peak228528", 
"Peak228553", "Peak228603", "Peak228612", "Peak228629", "Peak228630", 
"Peak228642", "Peak228651", "Peak228691", "Peak228740", "Peak4983", 
"Peak5261"), class = "factor"), annotation = structure(c(1L, 
4L, 5L, 1L, 1L, 1L, 6L, 8L, 1L, 1L, 1L, 1L, 1L, 1L, 8L, 8L, 8L, 
8L, 7L, 8L, 2L, 3L), .Label = c("Distal Intergenic", "Downstream (1-2kb)", 
"Downstream (2-3kb)", "Exon (ENST00000370460.6/2334, exon 16 of 21)", 
"Exon (ENST00000370460.6/2334, exon 21 of 21)", "Exon (ENST00000616857.4/84548, exon 3 of 3)", 
"Exon (ENST00000620118.4/ENST00000620118.4, exon 3 of 4)", "Promoter"
), class = "factor"), Output_required = structure(c(1L, 5L, 5L, 
1L, 1L, 1L, 5L, 6L, 1L, 1L, 1L, 1L, 1L, 1L, 6L, 6L, 6L, 6L, 4L, 
6L, 2L, 3L), .Label = c("Distal Intergenic", "Downstream (1-2kb)", 
"Downstream (2-3kb)", "Exon", "Exon ", "Promoter"), class = "factor")), class = "data.frame", row.names = c(NA, 
-22L))

代码语言:javascript
复制
 V4                                              annotation    Output_required
1  Peak228404                                       Distal Intergenic  Distal Intergenic
2  Peak228411            Exon (ENST00000370460.6/2334, exon 16 of 21)              Exon 
3  Peak228413            Exon (ENST00000370460.6/2334, exon 21 of 21)              Exon 
4  Peak228423                                       Distal Intergenic  Distal Intergenic
5  Peak228424                                       Distal Intergenic  Distal Intergenic
6  Peak228439                                       Distal Intergenic  Distal Intergenic
7  Peak228461             Exon (ENST00000616857.4/84548, exon 3 of 3)              Exon 
8  Peak228476                                                Promoter           Promoter
9  Peak228479                                       Distal Intergenic  Distal Intergenic
10 Peak228495                                       Distal Intergenic  Distal Intergenic
11 Peak228528                                       Distal Intergenic  Distal Intergenic
12 Peak228553                                       Distal Intergenic  Distal Intergenic
13 Peak228603                                       Distal Intergenic  Distal Intergenic
14 Peak228612                                       Distal Intergenic  Distal Intergenic
15 Peak228629                                                Promoter           Promoter
16 Peak228630                                                Promoter           Promoter
17 Peak228642                                                Promoter           Promoter
18 Peak228651                                                Promoter           Promoter
19 Peak228691 Exon (ENST00000620118.4/ENST00000620118.4, exon 3 of 4)               Exon
20 Peak228740                                                Promoter           Promoter
21   Peak4983                                      Downstream (1-2kb) Downstream (1-2kb)
22   Peak5261                                      Downstream (2-3kb) Downstream (2-3kb)

因此,在这个数据帧中,称为注释的列中有行,它包含字符串Exon,因此每个行中都有我不想要的括号内的文本,因为我希望保持它的一致性,这就是Exon。我添加了另一列Output_required,这是我想要的最终输出。

任何建议或帮助都将不胜感激。

EN

回答 3

Stack Overflow用户

回答已采纳

发布于 2021-09-24 11:19:51

'Exon'可以在lookbehind的帮助下编写之后删除所有内容。

代码语言:javascript
复制
sub('(?<=Exon).*', '', aa$annotation, perl = TRUE)

# [1] "Distal Intergenic"  "Exon"               "Exon"               "Distal Intergenic" 
# [5] "Distal Intergenic"  "Distal Intergenic"  "Exon"               "Promoter"          
# [9] "Distal Intergenic"  "Distal Intergenic"  "Distal Intergenic"  "Distal Intergenic" 
#[13] "Distal Intergenic"  "Distal Intergenic"  "Promoter"           "Promoter"          
#[17] "Promoter"           "Promoter"           "Exon"               "Promoter"          
#[21] "Downstream (1-2kb)" "Downstream (2-3kb)"

同样,也可以使用stringr::str_remove

代码语言:javascript
复制
stringr::str_remove(aa$annotation, '(?<=Exon).*')
票数 1
EN

Stack Overflow用户

发布于 2021-09-24 12:34:57

实现目标的另一种方法是使用反向引用:

代码语言:javascript
复制
sub("(Exon)(.*)", "\\1", aa$annotation)

在这里,我们将字符串划分为两个捕获组:

  • (Exon):这个组从字面上捕获Exon
  • (.*):--这个组捕获了sub的替换参数中使用的所有else
  • \\1:_ the反向引用,“回忆”第一个捕获组,但不是第二个,从而有效地删除了 it!

票数 1
EN

Stack Overflow用户

发布于 2021-09-24 18:14:48

我们可以使用来自trimwsbase R

代码语言:javascript
复制
trimws(aa$annotation, whitespace = "(?<=Exon).*")
[1] "Distal Intergenic"  "Exon"               "Exon"               "Distal Intergenic"  "Distal Intergenic"  "Distal Intergenic"  "Exon"              
 [8] "Promoter"           "Distal Intergenic"  "Distal Intergenic"  "Distal Intergenic"  "Distal Intergenic"  "Distal Intergenic"  "Distal Intergenic" 
[15] "Promoter"           "Promoter"           "Promoter"           "Promoter"           "Exon"               "Promoter"           "Downstream (1-2kb)"
[22] "Downstream (2-3kb)"
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/69314123

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档