我被要求读取两个文件(左读和右读) Aip02.R1.fastq和Aip02.R2.fastq,并使用zip函数获得一个交错的fasta文件。左边和右边的文件都是fastq文件,但是当我把它们压缩在一起生成一个新的fastq文件时,writer函数就不再起作用了。它给出错误"SeqRecord (id=)有一个无效的序列“。
#!/usr/bin/env python3
# Import Seq, SeqRecord, and SeqIO
from Bio import SeqIO
from Bio.SeqRecord import SeqRecord
from Bio.Seq import Seq
leftReads = SeqIO.parse("/scratch/AiptasiaMiSeq/fastq/Aip02.R1.fastq", "fastq")
rightReads = SeqIO.parse("/scratch/AiptasiaMiSeq/fastq/Aip02.R2.fastq","fastq")
A= zip(leftReads,rightReads)
SeqIO.write(SeqRecord(list(A)), "interleave.fastq", "fastq")发布于 2020-02-17 03:52:42
您的正向和反向序列可能具有相同的ID。因此,使用以下代码为ID添加一个后缀。我在这里使用了/1和/2,但也使用了像.f和.r这样的东西。
from Bio import SeqIO
import itertools
def interleave(iter1, iter2) :
for (forward, reverse) in itertools.izip(iter1, iter2):
assert forward.id == reverse.id
forward.id += "/1"
reverse.id += "/2"
yield forward
yield reverse
leftReads = SeqIO.parse("/scratch/AiptasiaMiSeq/fastq/Aip02.R1.fastq", "fastq")
rightReads = SeqIO.parse("/scratch/AiptasiaMiSeq/fastq/Aip02.R2.fastq","fastq")
handle = open("interleave.fastq", "w")
count = SeqIO.write(interleave(leftReads, rightReads), handle, "fastq")
handle.close()
print("{} records written to interleave.fastq".format(count))对于大型fastq文件,此代码可能会变得很慢。例如,查看here,他们报告创建一个2 2GB的fastq文件需要14分钟。所以他们给出了这个改进的方法:
from Bio.SeqIO.QualityIO import FastqGeneralIterator
import itertools
file_f = "/scratch/AiptasiaMiSeq/fastq/Aip02.R1.fastq"
file_r = "/scratch/AiptasiaMiSeq/fastq/Aip02.R2.fastq"
handle = open("interleave.fastq", "w")
count = 0
f_iter = FastqGeneralIterator(open(file_f,"rU"))
r_iter = FastqGeneralIterator(open(file_r,"rU"))
for (f_id, f_seq, f_q), (r_id, r_seq, r_q) in itertools.izip(f_iter,r_iter):
assert f_id == r_id
count += 2
#Write out both reads with "/1" and "/2" suffix on ID
handle.write("@%s/1n%sn+n%sn@%s/2n%sn+n%sn"
% (f_id, f_seq, f_q, r_id, r_seq, r_q))
handle.close()
print("{} records written to interleave.fastq".format(count)https://stackoverflow.com/questions/60235969
复制相似问题