我的任务是计数几种蛋白质在治疗后发生的突变。序列都以相同的顺序出现在这两个文件中。我在生物工程中用fasta解析器(SeqIO.parse)打开了两个文件,我得到了列出的所有蛋白质(治疗前后分离的)。
我的问题是:
from Bio import SeqIO
for normal_samples in SeqIO.parse("/data/statistic/normal_samples", "fasta"):
print(normal_samples.id)
print(repr(normal_samples.seq))
print(len(normal_samples))
for treated_samples in SeqIO.parse("/data/statistic/with_treatment", "fasta"):
print(normal_samples.id)
print(repr(normal_samples.seq))
print(len(normal_samples))
dict_n_t = dict(zip(normal_samples & treated_samples))
发布于 2018-02-01 20:25:35
假设:
您可以使用以下代码:
from Bio import SeqIO
normal_samples = SeqIO.parse("/data/statistic/normal_samples", "fasta")
treated_samples = SeqIO.parse("/data/statistic/with_treatment", "fasta")
for normal, treated in zip(normal_samples, treated_samples):
if normal.id == treated.id:
mutations = sum(1 for n, t in zip(str(normal.seq), str(treated.seq)) if n != t)
print(f"Found {mutations} mutation(s) for id {normal.id}")https://stackoverflow.com/questions/48546664
复制相似问题