我有一个FASTA文件test.fasta,它包含以下信息:
>QWE2J2_DEFR00000200123 DEFR00000560077.11 DEFR00000100333.7 3:444563-33443(-
)
acccaaagggagggagagagggctattatcatggaaaactaatttttcccagagaatttcctttcaaacctcccagtatc
tatgatcactcccaacgggaggtttaagtgcaacaccaggctgtgtctttctatcacggatttccacccggacacgtgga
acccggcctggtctgtctccaccatcctgactgggctcctgagcttcatggtggagaagggccccaccctgggcagtata
gagacgtcggacttcacgaaaagacaactggcagtgcagagaaaaggggggggggggggggataaagtcttttgtgaatt
atttcctgaagtcgtggaggagattaaacaaaaacagaaagcacaagacgaactcagtagcagaccccagactctcccct
tgccagacgtggttccagaaaaaaaaaaaaacctcgtccagaacgggattcagctgctcaacgggcatgcgccgggggcc
gtcccaaacctcgcagggctccagcaggccaaccggcaccacggactcctgggtggcgccctggcgaacttgtttgtgat
agttgggtttgcagcctttgcttacacggtcaagtaggggggggggggggcgcaggagtg我需要将其转换为以下格式的CSV:
>QWE2J2_DEFR00000200123,DEFR00000560077.11,DEFR00000100333.7,3:444563-33443(-),acccaaagggagggagagagggctattatcatggaaaactaatttttcccagagaatttcctttcaaacctcccagtatctatgatcactcccaacgggaggtttaagtgcaacaccaggctgtgtctttctatcacggatttccacccggacacgtggaacccggcctggtctgtctccaccatcctgactgggctcctgagcttcatggtggagaagggccccaccctgggcagtatagagacgtcggacttcacgaaaagacaactggcagtgcagagaaaaggggggggggggggggataaagtcttttgtgaattatttcctgaagtcgtggaggagattaaacaaaaacagaaagcacaagacgaactcagtagcagaccccagactctccccttgccagacgtggttccagaaaaaaaaaaaaacctcgtccagaacgggattcagctgctcaacgggcatgcgccgggggccgtcccaaacctcgcagggctccagcaggccaaccggcaccacggactcctgggtggcgccctggcgaacttgtttgtgatagttgggtttgcagcctttgcttacacggtcaagtaggggggggggggggcgcaggagtg我已经在Linux终端上尝试过了:
input_file=test.fasta; vim -c '0,$s/>\(.*\)\n/>\1,/' -c '0,$s/\(.*\)\n\([^>]\)/\1\2/' -c 'w! my-tmp.fasta.csv' -c 'q!' $input_file; mv my-tmp.fasta.csv $input_file.csv但是,它给出了错误的输出:
>QWE2J2_DEFR00000200123 DEFR00000560077.11 DEFR00000100333.7 3:444563-33443(-,)acccaaagggagggagagagggctattatcatggaaaactaatttttcccagagaatttcctttcaaacctcccagtatctatgatcactcccaacgggaggtttaagtgcaacaccaggctgtgtctttctatcacggatttccacccggacacgtggaacccggcctggtctgtctccaccatcctgactgggctcctgagcttcatggtggagaagggccccaccctgggcagtatagagacgtcggacttcacgaaaagacaactggcagtgcagagaaaaggggggggggggggggataaagtcttttgtgaattatttcctgaagtcgtggaggagattaaacaaaaacagaaagcacaagacgaactcagtagcagaccccagactctccccttgccagacgtggttccagaaaaaaaaaaaaacctcgtccagaacgggattcagctgctcaacgggcatgcgccgggggccgtcccaaacctcgcagggctccagcaggccaaccggcaccacggactcctgggtggcgccctggcgaacttgtttgtgatagttgggtttgcagcctttgcttacacggtcaagtaggggggggggggggcgcaggagtg如何创建此CSV文件?
发布于 2020-03-13 21:28:39
在RS设置为>的情况下使用awk非常简单:
awk -vRS='>' 'NR>1{
gsub(/ /, ",")
sub(/\)\n/, "),")
gsub("\n", "")
print RS $0
}' file带有-z的GNU sed看起来也很简单:
sed -z '
s/ /,/g
s/)\n/),/g
s/\n//g
s/>/\n>/g
s/^\n//
' file下面的sed脚本也应该可以工作:
sed -n '
# if line does not start with >
/^>/!{
# append the line to hold space
H
# if its not the end of file, start over
$!b
}
# switch pattern space with hold space
x
# add a comma after )
s/)/),/
# remove all the newlines
s/\n//g
# print it all, if hold space not empty
/^$/!p
# switch pattern space with hold space
x
# replace spaces with comma
s/ /,/g
# hold the line
h
' fileScripts written and tested on repl
>QWE2J2_DEFR00000200123,DEFR00000560077.11,DEFR00000100333.7,3:444563-33443(-),acccaaagggagggagagagggctattatcatggaaaactaatttttcccagagaatttcctttcaaacctcccagtatcacccggcctggtctgtctccaccatcctgactgggctcctgagcttcatggtggagaagggccccaccctgggcagtataatttcctgaagtcgtggaggagattaaacaaaaacagaaagcacaagacgaactcagtagcagaccccagactctcccctgtcccaaacctcgcagggctccagcaggccaaccggcaccacggactcctgggtggcgccctggcgaacttgtttgtgat首选sed而不是vim。
https://stackoverflow.com/questions/60670954
复制相似问题