由于我是新循环R,我将感谢您的帮助,我的问题。假设我有这样一个数据框架:
Family <- c('mir-1','mir-1','mir-3','mir-4','mir-4','LET-7', 'LET-7','mir-1','mir-4','LET-7')
Species <- c('hsa','chicken','hsa','hsa','chicken','hsa','hsa','chicken','chicken','hsa')
Tissue <- c('blood','liver','blood','blood','liver','skin','skin','skin','liver','nail')
star <- c('1','4','3','4','12','3','7','4','1','5') #numeric
mature <- c('9','6','8','1','7','3','4','2','8','9') #numeric
df <- data.frame(Family,Species,Tissue,star,mature)我的输出应该是这样的:
Family_ <- c('mir-1','mir-1','mir-3','mir-4','mir-4','LET-7', 'LET-7','mir-1','mir-4','LET-7')
Species_ <- c('hsa','chicken','hsa','hsa','chicken','hsa','hsa','chicken','chicken','hsa')
Tissue_ <- c('blood','liver','blood','blood','liver','skin','skin','skin','liver','nail')
star <- c('1','4','3','4','12','3','7','4','1','5') #numeric
mature <- c('9','6','8','1','7','3','4','2','8','9') #numeric
total_count <- c('10','10','11','5','28','17','17','6','28','14') #numeric
star_total <- c('1','4','3','4','13','10','10','4','13','5') #numeric
mature_total <- c('9','6','8','1','15','7','7','2','15','9') #numeric
df_new <- data.frame(Family_,Species_,Tissue_,star,mature,star_total,mature_total,total_count)我想在each family in each tissue in each species上循环一下。因此,对于第一列中的每个家族,即特定组织和特定物种的(不删除重复行),我想要计算total_count <- sum (mature) + sum (star)、star_total <- sum (star)、mature_total <- sum (mature) *,并添加一个额外的列*,名为rpm_mature,可以以这种方式计算rpm_mature <- mature_total/total_count*10^6 (这里的输出中不包括这个列)。因此,对于在相似物种的相似组织中有相似家族的行,对这些重复行的计算应该是相同的。也许我描述得不是很好,但如果你看一下输出,那就有意义了。谢谢
发布于 2020-02-26 02:12:45
下面是一种tidyverse方法--如果它有帮助的话:
library(tidyverse)
df %>%
mutate_at(c("star", "mature"), as.numeric) %>%
group_by(Family, Species, Tissue) %>%
mutate(total_count = sum(mature) + sum(star),
star_total = sum(star),
mature_total = sum(mature),
rpm_mature = mature_total/total_count*10^6)输出
# A tibble: 10 x 9
# Groups: Family, Species, Tissue [8]
Family Species Tissue star mature total_count star_total mature_total rpm_mature
<fct> <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 mir-1 hsa blood 1 8 9 1 8 888889.
2 mir-1 chicken liver 4 5 9 4 5 555556.
3 mir-3 hsa blood 3 7 10 3 7 700000
4 mir-4 hsa blood 4 1 5 4 1 200000
5 mir-4 chicken liver 2 6 16 3 13 812500
6 LET-7 hsa skin 3 3 16 9 7 437500
7 LET-7 hsa skin 6 4 16 9 7 437500
8 mir-1 chicken skin 4 2 6 4 2 333333.
9 mir-4 chicken liver 1 7 16 3 13 812500
10 LET-7 hsa nail 5 8 13 5 8 615385.编辑
如果您有兴趣开发一种循环的方法,您可以做以下工作以获得相同的结果:
df$star <- as.numeric(df$star)
df$mature <- as.numeric(df$mature)
df <- cbind(df, total_count = NA, star_total = NA, mature_total = NA)
for (Fam in df$Family) {
for (Spec in df$Species) {
for (Tiss in df$Tissue) {
res <- df[df$Family == Fam & df$Species == Spec & df$Tissue == Tiss,]
if (nrow(res) > 0) {
res$total_count = sum(res$mature) + sum(res$star)
res$star_total = sum(res$star)
res$mature_total = sum(res$mature)
df[df$Family == Fam & df$Species == Spec & df$Tissue == Tiss,] <- res
}
}
}
}
df$rpm_mature = df$mature_total/df$total_count*10^6发布于 2020-02-25 16:47:13
下面是一种方法,我们通过Family, Species, Tissue进行计算:
library(data.table)
setDT(df)
df[,":="(total_count = sum(mature) + sum(star),
star_total = sum(star),
mature_total = sum(mature),
rpm_mature = mature_total/total_count*10^6),.(Family, Species, Tissue)]
print(df)
Family Species Tissue star mature total_count star_total mature_total rpm_mature
1: mir-1 hsa blood 1 8 9 1 8 888888.9
2: mir-1 chicken liver 4 5 9 4 5 555555.6
3: mir-3 hsa blood 3 7 10 3 7 700000.0
4: mir-4 hsa blood 4 1 5 4 1 200000.0
5: mir-4 chicken liver 2 6 8 3 13 1625000.0
6: LET-7 hsa skin 3 3 6 9 7 1166666.7
7: LET-7 hsa skin 6 4 10 9 7 700000.0
8: mir-1 chicken skin 4 2 6 4 2 333333.3
9: mir-4 chicken liver 1 7 8 3 13 1625000.0
10: LET-7 hsa nail 5 8 13 5 8 615384.6https://stackoverflow.com/questions/60399322
复制相似问题