我有一组字符串,其ID以>开头。我希望在一行中获得每个ID后面的字符串,而不是像现在这样在多行上分离。字符串有时可以在1、2或3行上分开。
fileName="hairpin"
conn=file(fileName,open="r")
linn=readLines(conn)
for (i in 1:length(linn)){
print(linn[i])
}
close(conn)
head(linn)
[1] ">cel-let-7 MI0000001 Caenorhabditis elegans let-7 stem-loop"
[2] "UACACUGUGGAUCCGGUGAGGUAGUAGGUUGUAUAGUUUGGAAUAUUACCACCGGUGAAC"
[3] "UAUGCAAUUUUCUACCUUACCGGAGACAGAACUCUUCGA"
[4] ">cel-lin-4 MI0000002 Caenorhabditis elegans lin-4 stem-loop"
[5] "AUGCUUCCGGCCUGUUCCCUGAGACCUCAAGUGUGAGUGUACUAUUGAUGCUUCACACCU"
[6] "GGGCUCUCCGGGUACCAGGACGGUUUGAGCAGAU输出
[1] ">cel-let-7 MI0000001 Caenorhabditis elegans let-7 stem-loop" "UACACUGUGGAUCCGGUGAGGUAGUAGGUUGUAUAGUUUGGAAUAUUACCACCGGUGAACUAUGCAAUUUUCUACCUUACCGGAGACAGAACUCUUCGA"
[4] ">cel-lin-4 MI0000002 Caenorhabditis elegans lin-4 stem-loop" "AUGCUUCCGGCCUGUUCCCUGAGACCUCAAGUGUGAGUGUACUAUUGAUGCUUCACACCUGGGCUCUCCGGGUACCAGGACGGUUUGAGCAGAU"我在阿诺特网站上找到了解决方案:
awk '/^>/ {printf("\n%s\n",$0);next; } { printf("%s",$0);} END {printf("\n");}' < file.fa发布于 2014-09-27 20:16:08
试试这个:
g <- cumsum(grepl("^>", Lines)) # equals 1 for first group, 2 for second, etc.
unname(unlist(tapply(Lines, g, function(x) c(x[1], paste(x[-1], collapse = "")))))给予:
[1] ">cel-let-7 MI0000001 Caenorhabditis elegans let-7 stem-loop"
[2] "UACACUGUGGAUCCGGUGAGGUAGUAGGUUGUAUAGUUUGGAAUAUUACCACCGGUGAACUAUGCAAUUUUCUACCUUACCGGAGACAGAACUCUUCGA"
[3] ">cel-lin-4 MI0000002 Caenorhabditis elegans lin-4 stem-loop"
[4] "AUGCUUCCGGCCUGUUCCCUGAGACCUCAAGUGUGAGUGUACUAUUGAUGCUUCACACCUGGGCUCUCCGGGUACCAGGACGGUUUGAGCAGAU" 注意事项输入的Lines是:
Lines <- c(">cel-let-7 MI0000001 Caenorhabditis elegans let-7 stem-loop",
"UACACUGUGGAUCCGGUGAGGUAGUAGGUUGUAUAGUUUGGAAUAUUACCACCGGUGAAC",
"UAUGCAAUUUUCUACCUUACCGGAGACAGAACUCUUCGA",
">cel-lin-4 MI0000002 Caenorhabditis elegans lin-4 stem-loop",
"AUGCUUCCGGCCUGUUCCCUGAGACCUCAAGUGUGAGUGUACUAUUGAUGCUUCACACCU",
"GGGCUCUCCGGGUACCAGGACGGUUUGAGCAGAU")https://stackoverflow.com/questions/26078718
复制相似问题