我有这样一个很大的dataframe (只显示前三列):
dataframe称为chr22_hap12
2 1 3
2 1 3
2 1 3
2 1 2
2 2 1
2 2 1我想得到每一列的每个数字的比例(1,2和3),并将其存储在一个dataframe。
到目前为止,这就是我所拥有的:
for (i in 1:3 ) {
length(chr22_hap12[,i]) -> total_snps
sum(chr22_hap12[,i]==1,na.rm=FALSE) -> counts_ancestry_1
sum(chr22_hap12[,i]==2,na.rm=FALSE) -> counts_ancestry_2
sum(chr22_hap12[,i]==3,na.rm=FALSE) -> counts_ancestry_3
(counts_ancestry_1*100)/total_snps -> ancestry_1_perc
(counts_ancestry_2*100)/total_snps -> ancestry_2_perc
(counts_ancestry_3*100)/total_snps -> ancestry_3_perc
haplo_df[i] = NULL
haplo_df[i] = c(ancestry_1_perc,ancestry_2_perc,ancestry_3_perc)
as.data.frame(haplo_df[i])
}我得到了这些错误:在尝试设置haplo_dfi = NULL之后
haplo_dfi = NULL中的错误:找不到对象'haplo_df‘
在此之后
haplo_dfi = c(ancestry_1_perc,ancestry_2_perc,ancestry_3_perc)
haplo_dfi = c(ancestry_1_perc,ancestry_2_perc,ancestry_3_perc)中的错误:找不到对象'haplo_df‘
再次使用as.data.frame(haplo_dfi)
对象“haplo_df”未找到
我的期望输出应该如下所示:
0.00 66.66 50.0
100.00 33.33 33.33
0.00 0.00 16.66发布于 2014-10-07 15:51:09
您需要在循环之前定义生成的matrix,然后将新的结果cbind到该matrix。
# define the data.frame before the loop.
haplo_df <- NULL
for (i in 1:3 ) {
length(chr22_hap12[,i]) -> total_snps
sum(chr22_hap12[,i]==1,na.rm=FALSE) -> counts_ancestry_1
sum(chr22_hap12[,i]==2,na.rm=FALSE) -> counts_ancestry_2
sum(chr22_hap12[,i]==3,na.rm=FALSE) -> counts_ancestry_3
(counts_ancestry_1*100)/total_snps -> ancestry_1_perc
(counts_ancestry_2*100)/total_snps -> ancestry_2_perc
(counts_ancestry_3*100)/total_snps -> ancestry_3_perc
# bind the new result to the existing data
haplo_df <- cbind(haplo_df , c(ancestry_1_perc,ancestry_2_perc,ancestry_3_perc))
}
# return the result
haplo_df
## [,1] [,2] [,3]
## [1,] 0 66.66667 33.33333
## [2,] 100 33.33333 16.66667
## [3,] 0 0.00000 50.00000相反,您也可以使用apply和table。
apply(chr22_hap12, 2, function(x) 100*table(factor(x, levels=1:3))/length(x))
## V1 V2 V3
## 1 0 66.66667 33.33333
## 2 100 33.33333 16.66667
## 3 0 0.00000 50.00000发布于 2014-10-07 18:47:30
我的一条衬垫
sapply(df, function(x){prop.table(table(x))*100})发布于 2014-10-07 16:04:01
这是另一种方法。
样本数据:
set.seed(23)
y <- 1:3
df <- data.frame(a = sample(y, 10, replace = TRUE),
b = sample(y, 10, replace = TRUE),
c = sample(y, 10, replace = TRUE))
#df
# a b c
#1 2 3 2
#2 1 3 1
#3 1 2 1
#4 3 1 3
#5 3 3 2
#6 2 1 3
#7 3 2 3
#8 3 2 3
#9 3 3 1
#10 3 2 3计算百分比:
newdf <- as.data.frame(t(do.call(rbind, lapply(df, function(z){
sapply(y, function(x) (sum(z == x) / length(z))*100)
}))))
#newdf
# a b c
#1 0.2 0.2 0.3
#2 0.2 0.4 0.2
#3 0.6 0.4 0.5https://stackoverflow.com/questions/26239409
复制相似问题