我使用BioPython MuscleCommanLine对齐子进程中的序列。肌肉的输入和输出是标准和标准的。这是可行的,但一旦波彭调用肌肉,我就会从屏幕上的肌肉中得到一个程序摘要。这大大降低了程序的速度,因为有数百万次对子进程的调用。
mcline = MuscleCommandline()
read_list = (SeqRecord(Seq(seq, IUPAC.unambiguous_dna), str(index)) for index, seq in enumerate(grouped_reads_list))
muscle = Popen(str(mcline), stdin=PIPE, stdout=PIPE, universal_newlines=True)
SeqIO.write(read_list, muscle.stdin, "fasta") # Send sequences to Muscle in FASTA format.
muscle.stdin.close()
align = AlignIO.read(muscle.stdout, "fasta") # Capture output from muscle and get it into FASTA format in an object.
print(align)
muscle.stdout.close()
exit("Testin Testing")
consensus_read = AlignInfo.SummaryInfo(align).dumb_consensus(threshold=0.6, ambiguous="N", consensus_alpha=IUPAC.ambiguous_dna) # Create consensus from alignment object.屏幕输出是
肌肉v3.8.31 ( Robert C. Edgar )
http://www.drive5.com/muscle这个软件是捐赠给公共领域的。请引述: Edgar,R.C. .核酸Res 32(5),1792-97.
发布于 2014-12-13 15:49:19
我把这作为一个答案,而不是编辑我的问题,因为有人可能会发现它有用。如果我犯了错误,请告诉我。问题似乎在于以这种方式使用BioPython MuscleCommandLine包装器。在通过子进程进行调用时,我无法传递任何命令行选项来筛选包装器。我对此的修改代码如下。
cmd = ['muscle', "-quiet", "-maxiters", "1", "-diags"]
read_list = (SeqRecord(Seq(seq, IUPAC.unambiguous_dna), str(index)) for index, seq in enumerate(grouped_reads_list))
muscle = Popen(cmd, stdin=PIPE, stdout=PIPE, universal_newlines=True)
SeqIO.write(read_list, muscle.stdin, "fasta") # Send sequences to Muscle in FASTA format.
muscle.stdin.close()
align = AlignIO.read(muscle.stdout, 'fasta') # Capture output from muscle and get it into FASTA format in an object.
muscle.stdout.close()
consensus_read = AlignInfo.SummaryInfo(align).dumb_consensus(threshold=0.6, ambiguous="N", consensus_alpha=IUPAC.ambiguous_dna)
return str(consensus_read)发布于 2015-03-12 23:53:54
我想也许直接使用肌肉是一个更好的选择,如果有什么意外的东西,当调整序列通过BioPython。无论是哪种方式,做MSA都应该很容易。但是使用Biopython可能更麻烦一些。
https://stackoverflow.com/questions/27455086
复制相似问题