首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Fasta文件行问题

Fasta文件行问题
EN

Stack Overflow用户
提问于 2020-03-13 21:01:12
回答 1查看 20关注 0票数 0

我有一个FASTA文件test.fasta,它包含以下信息:

代码语言:javascript
复制
>QWE2J2_DEFR00000200123 DEFR00000560077.11 DEFR00000100333.7 3:444563-33443(-
)
acccaaagggagggagagagggctattatcatggaaaactaatttttcccagagaatttcctttcaaacctcccagtatc
tatgatcactcccaacgggaggtttaagtgcaacaccaggctgtgtctttctatcacggatttccacccggacacgtgga
acccggcctggtctgtctccaccatcctgactgggctcctgagcttcatggtggagaagggccccaccctgggcagtata
gagacgtcggacttcacgaaaagacaactggcagtgcagagaaaaggggggggggggggggataaagtcttttgtgaatt
atttcctgaagtcgtggaggagattaaacaaaaacagaaagcacaagacgaactcagtagcagaccccagactctcccct
tgccagacgtggttccagaaaaaaaaaaaaacctcgtccagaacgggattcagctgctcaacgggcatgcgccgggggcc
gtcccaaacctcgcagggctccagcaggccaaccggcaccacggactcctgggtggcgccctggcgaacttgtttgtgat
agttgggtttgcagcctttgcttacacggtcaagtaggggggggggggggcgcaggagtg

我需要将其转换为以下格式的CSV:

代码语言:javascript
复制
>QWE2J2_DEFR00000200123,DEFR00000560077.11,DEFR00000100333.7,3:444563-33443(-),acccaaagggagggagagagggctattatcatggaaaactaatttttcccagagaatttcctttcaaacctcccagtatctatgatcactcccaacgggaggtttaagtgcaacaccaggctgtgtctttctatcacggatttccacccggacacgtggaacccggcctggtctgtctccaccatcctgactgggctcctgagcttcatggtggagaagggccccaccctgggcagtatagagacgtcggacttcacgaaaagacaactggcagtgcagagaaaaggggggggggggggggataaagtcttttgtgaattatttcctgaagtcgtggaggagattaaacaaaaacagaaagcacaagacgaactcagtagcagaccccagactctccccttgccagacgtggttccagaaaaaaaaaaaaacctcgtccagaacgggattcagctgctcaacgggcatgcgccgggggccgtcccaaacctcgcagggctccagcaggccaaccggcaccacggactcctgggtggcgccctggcgaacttgtttgtgatagttgggtttgcagcctttgcttacacggtcaagtaggggggggggggggcgcaggagtg

我已经在Linux终端上尝试过了:

代码语言:javascript
复制
input_file=test.fasta; vim -c '0,$s/>\(.*\)\n/>\1,/' -c '0,$s/\(.*\)\n\([^>]\)/\1\2/' -c 'w! my-tmp.fasta.csv' -c 'q!'  $input_file; mv my-tmp.fasta.csv $input_file.csv

但是,它给出了错误的输出:

代码语言:javascript
复制
>QWE2J2_DEFR00000200123 DEFR00000560077.11 DEFR00000100333.7 3:444563-33443(-,)acccaaagggagggagagagggctattatcatggaaaactaatttttcccagagaatttcctttcaaacctcccagtatctatgatcactcccaacgggaggtttaagtgcaacaccaggctgtgtctttctatcacggatttccacccggacacgtggaacccggcctggtctgtctccaccatcctgactgggctcctgagcttcatggtggagaagggccccaccctgggcagtatagagacgtcggacttcacgaaaagacaactggcagtgcagagaaaaggggggggggggggggataaagtcttttgtgaattatttcctgaagtcgtggaggagattaaacaaaaacagaaagcacaagacgaactcagtagcagaccccagactctccccttgccagacgtggttccagaaaaaaaaaaaaacctcgtccagaacgggattcagctgctcaacgggcatgcgccgggggccgtcccaaacctcgcagggctccagcaggccaaccggcaccacggactcctgggtggcgccctggcgaacttgtttgtgatagttgggtttgcagcctttgcttacacggtcaagtaggggggggggggggcgcaggagtg

如何创建此CSV文件?

EN

回答 1

Stack Overflow用户

发布于 2020-03-13 21:28:39

RS设置为>的情况下使用awk非常简单:

代码语言:javascript
复制
awk -vRS='>' 'NR>1{
    gsub(/ /, ",")
    sub(/\)\n/, "),")
    gsub("\n", "")
    print RS $0
}' file

带有-z的GNU sed看起来也很简单:

代码语言:javascript
复制
sed -z '
    s/ /,/g
    s/)\n/),/g
    s/\n//g
    s/>/\n>/g
    s/^\n//
' file

下面的sed脚本也应该可以工作:

代码语言:javascript
复制
sed -n '
    # if line does not start with >
    /^>/!{
        # append the line to hold space
        H
        # if its not the end of file, start over
        $!b
    }
    # switch pattern space with hold space
    x
    # add a comma after )
    s/)/),/
    # remove all the newlines
    s/\n//g
    # print it all, if hold space not empty
    /^$/!p
    # switch pattern space with hold space
    x
    # replace spaces with comma
    s/ /,/g
    # hold the line
    h
' file

Scripts written and tested on repl

代码语言:javascript
复制
>QWE2J2_DEFR00000200123,DEFR00000560077.11,DEFR00000100333.7,3:444563-33443(-),acccaaagggagggagagagggctattatcatggaaaactaatttttcccagagaatttcctttcaaacctcccagtatcacccggcctggtctgtctccaccatcctgactgggctcctgagcttcatggtggagaagggccccaccctgggcagtataatttcctgaagtcgtggaggagattaaacaaaaacagaaagcacaagacgaactcagtagcagaccccagactctcccctgtcccaaacctcgcagggctccagcaggccaaccggcaccacggactcctgggtggcgccctggcgaacttgtttgtgat

首选sed而不是vim

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/60670954

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档