文章/答案/技术大牛

发布

社区首页 >问答首页 >在分子间隔内打印字典键列表

问在分子间隔内打印字典键列表
EN

Stack Overflow用户

提问于 2021-12-14 15:32:36

回答 1查看 191关注 0票数 0

对于赋值，我需要将以前的函数实现为一个新函数，给定FASTA文件和一个min和max分子量，返回在给定区间内具有分子量的序列的序列ID列表。

这是我以前的职责：

def Dict_MW(file_name):
with open(file_name) as seq_file:
    seq_dict = {}
    for record in SeqIO.parse(seq_file, 'fasta'):
        d = IUPACData.ambiguous_dna_values
        ambiguous_dna = list(map("".join, product(*map(d.get, record))))
        mol_weight = []
        for seq in ambiguous_dna:
            mol_weight.append(SeqUtils.molecular_weight(seq))
        tuple = (min(mol_weight),max(mol_weight))
        if min(mol_weight) != max(mol_weight):
            seq_dict[record.id] = (min(mol_weight), max(mol_weight))
        else:
            seq_dict[record.id] = min(mol_weight)
    print(seq_dict)

这个函数打印一个字典，作为键，ID和分子量是值。

这是一个新的功能：

   def List(file_name, mw_min, mw_max):
    with open(file_name) as seq_file:
        seq_dict = {}
        ID = []
        for record in SeqIO.parse(seq_file, 'fasta'):
            d = IUPACData.ambiguous_dna_values
            ambiguous_dna = list(map("".join, product(*map(d.get, record))))
            mol_weight = []
            for seq in ambiguous_dna:
                mol_weight.append(SeqUtils.molecular_weight(seq))
            tuple = (min(mol_weight),max(mol_weight))
            if min(mol_weight) != max(mol_weight):
                seq_dict[record.id] = (min(mol_weight), max(mol_weight))
            else:
                seq_dict[record.id] = min(mol_weight)
            for values in mol_weight:
                if mw_min <= values <= mw_max:
                    ID.append(seq_dict.keys())
        print(ID)

它工作，但它不是正确的输出。它给出了所有的ID，而不仅仅是在给定的分子区间内的唯一ID。

我使用的Fasta文件：

>seq_7009 random sequence
DGRGGGWAVCVAACGTTGAT
>seq_418 random sequence
GAGCTGVTATST
>seq_9143_unamb random sequence
ACCGTTAAGCCTTAG
>seq_2888 random sequence
RVCCWDGARATAGBCGC
>seq_1101 random sequence
CSAATGYGATNBTA
>seq_107 random sequence
WGDGHGCDCTYANGTTWCA
>seq_6946 random sequence
TCVMBRAGRSGTCCAWA
>seq_6162 random sequence
YWBGCKTGCCAAGCGCDG
>seq_504 random sequence
ADDTAACCCTCTTKA
>seq_3535 random sequence
KKGTACACCAG
>seq_4077 random sequence
SRWSCRTTRVAGDCC
> seq_1626_unamb random sequence
GGATATTACCTA

biopython

fasta

python

dictionary

sequence

回答 1

Stack Overflow用户

发布于 2021-12-19 23:05:10

这就是我试图解决这个问题的方法，我假设我们有相同的python任务。

from Bio import SeqIO
from Bio.Seq import Seq
from Bio import SeqUtils
import matplotlib.pyplot as plt

## function description

def unamb_MW(filename):
    mol_weight_list = []
    mol_weight_dict = dict()
    nucleotides ={'A','T', 'C', 'G'} # om de nucleotiden te defineren zodat bij ambiguous seq, de N als niet nucleotide wordt herkend.
    with open(filename) as file: #omdat de filename een fasta bestand is moeten we die omzetten naar string zodat we de biopython functie SeqUtils kunnen gebruiken voor de moleculaire massa te berekenen.
        for record in SeqIO.parse(file, "fasta"): #met SeqIO.parse wordt de fasta file klaar gezet om gelezen te worden.
            for nucl in record: #om alle seq in de fasta file te doorlopen.
                if nucl in nucleotides: # om ambiguous seq van unambiguous seq te scheiden, want dit zal ons een error vermjden van SeqUtils omdat SeqUtils enkel met ambiguous seq werkt.
                    continue
                else:
                    print(str(record.id)+": is ambiguous") # om ambiguous seq te printen zo kan je de fasta open doen en controleren als je code wel zeker de ambiguous buiten laat.
                    break
            else:
                mol_weight= Bio.SeqUtils.molecular_weight(record.seq) #biopython functie om moleculaire massa te berekenen.
                print(str(record.id)+": is unambiguous & molecular weight = "+str(mol_weight))
                mol_weight_list.append(mol_weight)
                mol_weight_dict[str(record.id)] = mol_weight
    #print(mol_weight_dict)
    return mol_weight_dict

def MW_list(filename, min_MW, max_MW):
    mol_weight = unamb_MW(filename)
    for record in mol_weight:
        if min_MW < mol_weight[record]:
            if mol_weight[record] < max_MW:
                print('\n', [record])
            else:
                pass
        else:
            pass ´´´
#If you're taking the course computational biology then we have the same assignment.

票数 -1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/70351467

复制

相似问题

问在分子间隔内打印字典键列表
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在分子间隔内打印字典键列表EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在分子间隔内打印字典键列表
EN