我有一个包含数千个下三角矩阵(一个在另一个下面)的文件:
1|Gene1_PRT1
2|Gene2_PRT1 0
2|Gene3_PRT1 0 0
1|Gene7_PRT1 1.4287 1.4287 1.5293
2|Gene9_PRT1 1.4428 1.4428 1.5293 0
2|Gene90_PRT1
1|Gene60_PRT1 1.6242
2|Gene26454_PRT1 -1 -1我需要一个列表/表,在左边有成对的(基因)名称和值(其中对角线,0与自身的比较被删除)。像这样:
2|Gene68760_PRT1 1|Gene32540_PRT1 0
2|Gene99122_PRT1 1|Gene32540_PRT1 0
1|Gene2362_PRT1 1|Gene32540_PRT1 1.4287
2|Gene63993_PRT1 1|Gene32540_PRT1 1.4428
2|Gene99122_PRT1 2|Gene68760_PRT1 0
1|Gene2362_PRT1 2|Gene68760_PRT1 1.4287
2|Gene63993_PRT1 2|Gene68760_PRT1 1.4428
1|Gene2362_PRT1 2|Gene99122_PRT1 1.5293
2|Gene63993_PRT1 2|Gene99122_PRT1 1.5293
2|Gene63993_PRT1 1|Gene2362_PRT1 0我尝试了一点,简单的grep等函数,我有一个值列表,但左边没有成对的名称。我是(生物)信息学的新手,正在努力学习...
发布于 2013-07-10 16:58:19
不知道这是否足够快:
#read the data
dat <- readLines(textConnection("1|Gene1_PRT1
2|Gene2_PRT1 0
2|Gene3_PRT1 0 0
1|Gene7_PRT1 1.4287 1.4287 1.5293
2|Gene9_PRT1 1.4428 1.4428 1.5293 0
2|Gene90_PRT1
1|Gene60_PRT1 1.6242
2|Gene26454_PRT1 -1 -1"))
#split the data using the fact that there are empty rows
dat <- split(dat[dat!=""],cumsum(dat=="")[dat!=""])
#split the rows
dat <- lapply(dat,strsplit,split=" +")
#create matrices with lower triangles and melt them
library(reshape2)
dat <- lapply(dat,function(x) {
mat <- matrix(ncol=length(x),nrow=length(x))
nam <- do.call(c,lapply(x,function(y) y[1]))
rownames(mat) <- nam
colnames(mat) <- nam
mat[upper.tri(mat)] <- do.call(c,lapply(x,function(y) as.numeric(y[-1])))
na.omit(melt(t(mat)))
})
#rbind everything together
do.call(rbind,dat)
# Var1 Var2 value
# 0.2 2|Gene2_PRT1 1|Gene1_PRT1 0.0000
# 0.3 2|Gene3_PRT1 1|Gene1_PRT1 0.0000
# 0.4 1|Gene7_PRT1 1|Gene1_PRT1 1.4287
# 0.5 2|Gene9_PRT1 1|Gene1_PRT1 1.4428
# 0.8 2|Gene3_PRT1 2|Gene2_PRT1 0.0000
# 0.9 1|Gene7_PRT1 2|Gene2_PRT1 1.4287
# 0.10 2|Gene9_PRT1 2|Gene2_PRT1 1.4428
# 0.14 1|Gene7_PRT1 2|Gene3_PRT1 1.5293
# 0.15 2|Gene9_PRT1 2|Gene3_PRT1 1.5293
# 0.20 2|Gene9_PRT1 1|Gene7_PRT1 0.0000
# 1.2 1|Gene60_PRT1 2|Gene90_PRT1 1.6242
# 1.3 2|Gene26454_PRT1 2|Gene90_PRT1 -1.0000
# 1.6 2|Gene26454_PRT1 1|Gene60_PRT1 -1.0000https://stackoverflow.com/questions/17565703
复制相似问题