首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何使用R有条件地遍历数据帧

如何使用R有条件地遍历数据帧
EN

Stack Overflow用户
提问于 2020-02-25 16:37:04
回答 2查看 58关注 0票数 0

由于我是新循环R,我将感谢您的帮助,我的问题。假设我有这样一个数据框架:

代码语言:javascript
复制
Family <- c('mir-1','mir-1','mir-3','mir-4','mir-4','LET-7', 'LET-7','mir-1','mir-4','LET-7')
Species <- c('hsa','chicken','hsa','hsa','chicken','hsa','hsa','chicken','chicken','hsa')
Tissue <- c('blood','liver','blood','blood','liver','skin','skin','skin','liver','nail')
star <- c('1','4','3','4','12','3','7','4','1','5') #numeric
mature <- c('9','6','8','1','7','3','4','2','8','9')  #numeric
df <- data.frame(Family,Species,Tissue,star,mature)

我的输出应该是这样的:

代码语言:javascript
复制
Family_ <- c('mir-1','mir-1','mir-3','mir-4','mir-4','LET-7', 'LET-7','mir-1','mir-4','LET-7')
Species_ <- c('hsa','chicken','hsa','hsa','chicken','hsa','hsa','chicken','chicken','hsa')
Tissue_ <- c('blood','liver','blood','blood','liver','skin','skin','skin','liver','nail')
star <- c('1','4','3','4','12','3','7','4','1','5') #numeric
mature <- c('9','6','8','1','7','3','4','2','8','9')  #numeric
total_count <- c('10','10','11','5','28','17','17','6','28','14')  #numeric
star_total <- c('1','4','3','4','13','10','10','4','13','5')  #numeric
mature_total <- c('9','6','8','1','15','7','7','2','15','9')  #numeric
df_new <- data.frame(Family_,Species_,Tissue_,star,mature,star_total,mature_total,total_count)

我想在each family in each tissue in each species上循环一下。因此,对于第一列中的每个家族,即特定组织和特定物种的(不删除重复行),我想要计算total_count <- sum (mature) + sum (star)star_total <- sum (star)mature_total <- sum (mature) *,并添加一个额外的列*,名为rpm_mature,可以以这种方式计算rpm_mature <- mature_total/total_count*10^6 (这里的输出中不包括这个列)。因此,对于在相似物种的相似组织中有相似家族的行,对这些重复行的计算应该是相同的。也许我描述得不是很好,但如果你看一下输出,那就有意义了。谢谢

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2020-02-26 02:12:45

下面是一种tidyverse方法--如果它有帮助的话:

代码语言:javascript
复制
library(tidyverse)

df %>%
  mutate_at(c("star", "mature"), as.numeric) %>%
  group_by(Family, Species, Tissue) %>%
  mutate(total_count = sum(mature) + sum(star),
         star_total = sum(star),
         mature_total = sum(mature),
         rpm_mature = mature_total/total_count*10^6)

输出

代码语言:javascript
复制
# A tibble: 10 x 9
# Groups:   Family, Species, Tissue [8]
   Family Species Tissue  star mature total_count star_total mature_total rpm_mature
   <fct>  <fct>   <fct>  <dbl>  <dbl>       <dbl>      <dbl>        <dbl>      <dbl>
 1 mir-1  hsa     blood      1      8           9          1            8    888889.
 2 mir-1  chicken liver      4      5           9          4            5    555556.
 3 mir-3  hsa     blood      3      7          10          3            7    700000 
 4 mir-4  hsa     blood      4      1           5          4            1    200000 
 5 mir-4  chicken liver      2      6          16          3           13    812500 
 6 LET-7  hsa     skin       3      3          16          9            7    437500 
 7 LET-7  hsa     skin       6      4          16          9            7    437500 
 8 mir-1  chicken skin       4      2           6          4            2    333333.
 9 mir-4  chicken liver      1      7          16          3           13    812500 
10 LET-7  hsa     nail       5      8          13          5            8    615385.

编辑

如果您有兴趣开发一种循环的方法,您可以做以下工作以获得相同的结果:

代码语言:javascript
复制
df$star <- as.numeric(df$star)
df$mature <- as.numeric(df$mature)

df <- cbind(df, total_count = NA, star_total = NA, mature_total = NA)

for (Fam in df$Family) {
  for (Spec in df$Species) {
    for (Tiss in df$Tissue) {
      res <- df[df$Family == Fam & df$Species == Spec & df$Tissue == Tiss,]
      if (nrow(res) > 0) {
        res$total_count = sum(res$mature) + sum(res$star)
        res$star_total = sum(res$star)
        res$mature_total = sum(res$mature)
        df[df$Family == Fam & df$Species == Spec & df$Tissue == Tiss,] <- res
      }
    }
  }
}

df$rpm_mature = df$mature_total/df$total_count*10^6
票数 1
EN

Stack Overflow用户

发布于 2020-02-25 16:47:13

下面是一种方法,我们通过Family, Species, Tissue进行计算:

代码语言:javascript
复制
library(data.table)
setDT(df)
df[,":="(total_count = sum(mature) + sum(star),
         star_total = sum(star),
         mature_total = sum(mature),
         rpm_mature = mature_total/total_count*10^6),.(Family, Species, Tissue)]

print(df)

    Family Species Tissue star mature total_count star_total mature_total rpm_mature
 1:  mir-1     hsa  blood    1      8           9          1            8   888888.9
 2:  mir-1 chicken  liver    4      5           9          4            5   555555.6
 3:  mir-3     hsa  blood    3      7          10          3            7   700000.0
 4:  mir-4     hsa  blood    4      1           5          4            1   200000.0
 5:  mir-4 chicken  liver    2      6           8          3           13  1625000.0
 6:  LET-7     hsa   skin    3      3           6          9            7  1166666.7
 7:  LET-7     hsa   skin    6      4          10          9            7   700000.0
 8:  mir-1 chicken   skin    4      2           6          4            2   333333.3
 9:  mir-4 chicken  liver    1      7           8          3           13  1625000.0
10:  LET-7     hsa   nail    5      8          13          5            8   615384.6
票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/60399322

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档