首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >特定字符串的条件出现,并使用R相对生成新的数据帧

特定字符串的条件出现,并使用R相对生成新的数据帧
EN

Stack Overflow用户
提问于 2020-01-28 20:32:36
回答 1查看 99关注 0票数 1

我有一个4列多行的大数据框架(附件中有一个例子)。

代码语言:javascript
复制
#what I have
Arm <- c("5prime","3prime","5prime","CoMature","3prime","5prime","3prime","3prime")
Family <- c("LET-7","LET-7","LET-7","MIR-10","MIR-103","MIR-124","MIR-124","MIR-124")
Sequence <- c("ATCGGCA","ATGCTAC","ATCGGCA","ATCGTTT","TGAGGAG","TGATCAG","AATTCAG","AATTCAG")
Star_seq <- c("TTCAGGT","TATACTG","TTCAGGT","GAGATCA","CAAAAGC","CACATGC","AATATGC","AATATGC")
my_data_frame <- data.frame(Arm,Family,Sequence,Star_seq)

我想要做的基本上是,对于Family列中的每个i,计算Arm列中出现'5prime‘、'3prime’或'CoMature‘的次数。然后,对于最频繁的一个('5prime','3prime‘或'CoMature'),取第三和第四列。总而言之,我需要一个最终文件,它在Family列中显示每个i的最频繁的arm (在第一行中),并在第三和第四列中显示它们的相对序列。

代码语言:javascript
复制
#what I want as output
five_prime_counts <- c("2","0","0","1")
three_prime_counts <- c("1","0","1","2")
CoMature_counts <- c("0","1","0","0")
Arm_new <- c("5prime","CoMature","3prime","3prime")
Family_new <- c("LET-7","MIR-10","MIR-103","MIR-124")
Sequence_new <- c("ATCGGCA","ATCGTTT","TGAGGAG","AATTCAG")
Star_seq_new <- c("TTCAGGT","GAGATCA","CAAAAGC","AATATGC")
my_data_frame_new <- data.frame(five_prime_counts,three_prime_counts,CoMature_counts,Arm_new,Family_new,Sequence_new,Star_seq_new)
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-01-28 20:45:40

我们可以为每个FamilyArm添加一个计数变量,为每个Family中的最大计数获得相应的SequenceStar_seqArm值,并获得宽格式的数据。

代码语言:javascript
复制
library(dplyr)

my_data_frame %>%
  add_count(Family, Arm) %>%
  group_by(Family) %>%
  mutate(Sequence = Sequence[which.max(n)], 
         Star_seq =  Star_seq[which.max(n)], 
         Arm_new = Arm[which.max(n)]) %>%
  distinct() %>%
  tidyr::pivot_wider(names_from = Arm, values_from = n, values_fill = list(n = 0))

#  Family  Sequence Star_seq Arm_new  `5prime` `3prime` CoMature
#  <fct>   <fct>    <fct>    <fct>       <int>    <int>    <int>
#1 LET-7   ATCGGCA  TTCAGGT  5prime          2        1        0
#2 MIR-10  ATCGTTT  GAGATCA  CoMature        0        0        1
#3 MIR-103 TGAGGAG  CAAAAGC  3prime          0        1        0
#4 MIR-124 AATTCAG  AATATGC  3prime          1        2        0
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/59948871

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档