首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何根据类别排列数据并随后变异新的协变量,列出符合特定类别的所有名称

如何根据类别排列数据并随后变异新的协变量,列出符合特定类别的所有名称
EN

Stack Overflow用户
提问于 2021-07-24 19:10:44
回答 1查看 28关注 0票数 2

我正在研究基因本体论,拥有这样的数据:

代码语言:javascript
复制
> head(BT_Ctrl_go_terms, 13)
# A tibble: 13 x 4
   go_term        n gene     go_name                 
   <chr>      <int> <chr>    <chr>                   
 1 GO:0001525    15 NRP1     angiogenesis            
 2 GO:0001525    15 ANG      angiogenesis            
 3 GO:0001525    15 THY1     angiogenesis            
 4 GO:0001525    15 ATP5F1B  angiogenesis            
 5 GO:0001525    15 ECM1     angiogenesis            
 6 GO:0001666     6 ANG      response to hypoxia     
 7 GO:0001666     6 CAT      response to hypoxia     
 8 GO:0001666     6 HSP90B1  response to hypoxia     
 9 GO:0002250     8 IGKV1-27 adaptive immune response
10 GO:0002250     8 IGHV3-21 adaptive immune response
11 GO:0002250     8 TNFRSF21 adaptive immune response
12 GO:0002250     8 IGLV2-11 adaptive immune response
13 GO:0002250     8 IGHV4-34 adaptive immune response

我需要安排数据,以便每个go_name列在一行上一次。然后,我需要一个新的协变量genes,它列出了属于相应BT_Ctrl_go_term$go_name的所有BT_Ctrl_go_term$gene。每个gene name必须用,分隔。

预期输出

代码语言:javascript
复制
     go_term  n                  go_name                                            genes
1 GO:0001525 15             angiogenesis                   NRP1, ANG, THY1, ATP5F1B, ECM1
2 GO:0001666  6      response to hypoxia                                ANG, CAT, HSP90B1
3 GO:0002250  8 adaptive immune response IGKV1-27, IGHV3-21, TNFRSF21, IGLV2-11, IGHV4-34

最好是dplyr解决方案。

数据

代码语言:javascript
复制
BT_Ctrl_go_term <- structure(list(go_term = c("GO:0001525", "GO:0001525", "GO:0001525", 
"GO:0001525", "GO:0001525", "GO:0001666", "GO:0001666", "GO:0001666", 
"GO:0002250", "GO:0002250", "GO:0002250", "GO:0002250", "GO:0002250"
), n = c(15L, 15L, 15L, 15L, 15L, 6L, 6L, 6L, 8L, 8L, 8L, 8L, 
8L), gene = c("NRP1", "ANG", "THY1", "ATP5F1B", "ECM1", "ANG", 
"CAT", "HSP90B1", "IGKV1-27", "IGHV3-21", "TNFRSF21", "IGLV2-11", 
"IGHV4-34"), go_name = c("angiogenesis", "angiogenesis", "angiogenesis", 
"angiogenesis", "angiogenesis", "response to hypoxia", "response to hypoxia", 
"response to hypoxia", "adaptive immune response", "adaptive immune response", 
"adaptive immune response", "adaptive immune response", "adaptive immune response"
)), row.names = c(NA, -13L), class = c("tbl_df", "tbl", "data.frame"
))
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-07-24 19:14:52

我们可以按组粘贴

代码语言:javascript
复制
library(dplyr)
BT_Ctrl_go_term %>% 
    group_by(go_term, n, go_name) %>% 
    summarise(gene = toString(unique(gene)), .groups = 'drop')

-ouptut

代码语言:javascript
复制
# A tibble: 3 x 4
  go_term        n go_name                  gene                                            
  <chr>      <int> <chr>                    <chr>                                           
1 GO:0001525    15 angiogenesis             NRP1, ANG, THY1, ATP5F1B, ECM1                  
2 GO:0001666     6 response to hypoxia      ANG, CAT, HSP90B1                               
3 GO:0002250     8 adaptive immune response IGKV1-27, IGHV3-21, TNFRSF21, IGLV2-11, IGHV4-34
票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/68513068

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档