首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >用多行分隔的字符串组合

用多行分隔的字符串组合
EN

Stack Overflow用户
提问于 2014-09-27 20:05:03
回答 1查看 90关注 0票数 1

我有一组字符串,其ID以>开头。我希望在一行中获得每个ID后面的字符串,而不是像现在这样在多行上分离。字符串有时可以在1、2或3行上分开。

代码语言:javascript
复制
fileName="hairpin"
conn=file(fileName,open="r")
linn=readLines(conn)
for (i in 1:length(linn)){
 print(linn[i])
}
close(conn)
head(linn)

[1] ">cel-let-7 MI0000001 Caenorhabditis elegans let-7 stem-loop" 
[2] "UACACUGUGGAUCCGGUGAGGUAGUAGGUUGUAUAGUUUGGAAUAUUACCACCGGUGAAC"
[3] "UAUGCAAUUUUCUACCUUACCGGAGACAGAACUCUUCGA"                     
[4] ">cel-lin-4 MI0000002 Caenorhabditis elegans lin-4 stem-loop" 
[5] "AUGCUUCCGGCCUGUUCCCUGAGACCUCAAGUGUGAGUGUACUAUUGAUGCUUCACACCU"
[6] "GGGCUCUCCGGGUACCAGGACGGUUUGAGCAGAU

输出

代码语言:javascript
复制
[1] ">cel-let-7 MI0000001 Caenorhabditis elegans let-7 stem-loop"  "UACACUGUGGAUCCGGUGAGGUAGUAGGUUGUAUAGUUUGGAAUAUUACCACCGGUGAACUAUGCAAUUUUCUACCUUACCGGAGACAGAACUCUUCGA"                     
[4] ">cel-lin-4 MI0000002 Caenorhabditis elegans lin-4 stem-loop"  "AUGCUUCCGGCCUGUUCCCUGAGACCUCAAGUGUGAGUGUACUAUUGAUGCUUCACACCUGGGCUCUCCGGGUACCAGGACGGUUUGAGCAGAU"

我在阿诺特网站上找到了解决方案:

代码语言:javascript
复制
 awk '/^>/ {printf("\n%s\n",$0);next; } { printf("%s",$0);}  END {printf("\n");}' < file.fa
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2014-09-27 20:16:08

试试这个:

代码语言:javascript
复制
g <- cumsum(grepl("^>", Lines)) # equals 1 for first group, 2 for second, etc.
unname(unlist(tapply(Lines, g, function(x) c(x[1], paste(x[-1], collapse = "")))))

给予:

代码语言:javascript
复制
[1] ">cel-let-7 MI0000001 Caenorhabditis elegans let-7 stem-loop"                                        
[2] "UACACUGUGGAUCCGGUGAGGUAGUAGGUUGUAUAGUUUGGAAUAUUACCACCGGUGAACUAUGCAAUUUUCUACCUUACCGGAGACAGAACUCUUCGA"
[3] ">cel-lin-4 MI0000002 Caenorhabditis elegans lin-4 stem-loop"                                        
[4] "AUGCUUCCGGCCUGUUCCCUGAGACCUCAAGUGUGAGUGUACUAUUGAUGCUUCACACCUGGGCUCUCCGGGUACCAGGACGGUUUGAGCAGAU"     

注意事项输入的Lines是:

代码语言:javascript
复制
Lines <- c(">cel-let-7 MI0000001 Caenorhabditis elegans let-7 stem-loop",
"UACACUGUGGAUCCGGUGAGGUAGUAGGUUGUAUAGUUUGGAAUAUUACCACCGGUGAAC",
"UAUGCAAUUUUCUACCUUACCGGAGACAGAACUCUUCGA",
">cel-lin-4 MI0000002 Caenorhabditis elegans lin-4 stem-loop",
"AUGCUUCCGGCCUGUUCCCUGAGACCUCAAGUGUGAGUGUACUAUUGAUGCUUCACACCU",
"GGGCUCUCCGGGUACCAGGACGGUUUGAGCAGAU")
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/26078718

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档