首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >在分子间隔内打印字典键列表

在分子间隔内打印字典键列表
EN

Stack Overflow用户
提问于 2021-12-14 15:32:36
回答 1查看 191关注 0票数 0

对于赋值,我需要将以前的函数实现为一个新函数,给定FASTA文件和一个min和max分子量,返回在给定区间内具有分子量的序列的序列ID列表。

这是我以前的职责:

代码语言:javascript
复制
def Dict_MW(file_name):
with open(file_name) as seq_file:
    seq_dict = {}
    for record in SeqIO.parse(seq_file, 'fasta'):
        d = IUPACData.ambiguous_dna_values
        ambiguous_dna = list(map("".join, product(*map(d.get, record))))
        mol_weight = []
        for seq in ambiguous_dna:
            mol_weight.append(SeqUtils.molecular_weight(seq))
        tuple = (min(mol_weight),max(mol_weight))
        if min(mol_weight) != max(mol_weight):
            seq_dict[record.id] = (min(mol_weight), max(mol_weight))
        else:
            seq_dict[record.id] = min(mol_weight)
    print(seq_dict)

这个函数打印一个字典,作为键,ID和分子量是值。

这是一个新的功能:

代码语言:javascript
复制
   def List(file_name, mw_min, mw_max):
    with open(file_name) as seq_file:
        seq_dict = {}
        ID = []
        for record in SeqIO.parse(seq_file, 'fasta'):
            d = IUPACData.ambiguous_dna_values
            ambiguous_dna = list(map("".join, product(*map(d.get, record))))
            mol_weight = []
            for seq in ambiguous_dna:
                mol_weight.append(SeqUtils.molecular_weight(seq))
            tuple = (min(mol_weight),max(mol_weight))
            if min(mol_weight) != max(mol_weight):
                seq_dict[record.id] = (min(mol_weight), max(mol_weight))
            else:
                seq_dict[record.id] = min(mol_weight)
            for values in mol_weight:
                if mw_min <= values <= mw_max:
                    ID.append(seq_dict.keys())
        print(ID)

它工作,但它不是正确的输出。它给出了所有的ID,而不仅仅是在给定的分子区间内的唯一ID。

我使用的Fasta文件:

代码语言:javascript
复制
>seq_7009 random sequence
DGRGGGWAVCVAACGTTGAT
>seq_418 random sequence
GAGCTGVTATST
>seq_9143_unamb random sequence
ACCGTTAAGCCTTAG
>seq_2888 random sequence
RVCCWDGARATAGBCGC
>seq_1101 random sequence
CSAATGYGATNBTA
>seq_107 random sequence
WGDGHGCDCTYANGTTWCA
>seq_6946 random sequence
TCVMBRAGRSGTCCAWA
>seq_6162 random sequence
YWBGCKTGCCAAGCGCDG
>seq_504 random sequence
ADDTAACCCTCTTKA
>seq_3535 random sequence
KKGTACACCAG
>seq_4077 random sequence
SRWSCRTTRVAGDCC
> seq_1626_unamb random sequence
GGATATTACCTA
EN

回答 1

Stack Overflow用户

发布于 2021-12-19 23:05:10

这就是我试图解决这个问题的方法,我假设我们有相同的python任务。

代码语言:javascript
复制
from Bio import SeqIO
from Bio.Seq import Seq
from Bio import SeqUtils
import matplotlib.pyplot as plt

## function description

def unamb_MW(filename):
    mol_weight_list = []
    mol_weight_dict = dict()
    nucleotides ={'A','T', 'C', 'G'} # om de nucleotiden te defineren zodat bij ambiguous seq, de N als niet nucleotide wordt herkend.
    with open(filename) as file: #omdat de filename een fasta bestand is moeten we die omzetten naar string zodat we de biopython functie SeqUtils kunnen gebruiken voor de moleculaire massa te berekenen.
        for record in SeqIO.parse(file, "fasta"): #met SeqIO.parse wordt de fasta file klaar gezet om gelezen te worden.
            for nucl in record: #om alle seq in de fasta file te doorlopen.
                if nucl in nucleotides: # om ambiguous seq van unambiguous seq te scheiden, want dit zal ons een error vermjden van SeqUtils omdat SeqUtils enkel met ambiguous seq werkt.
                    continue
                else:
                    print(str(record.id)+": is ambiguous") # om ambiguous seq te printen zo kan je de fasta open doen en controleren als je code wel zeker de ambiguous buiten laat.
                    break
            else:
                mol_weight= Bio.SeqUtils.molecular_weight(record.seq) #biopython functie om moleculaire massa te berekenen.
                print(str(record.id)+": is unambiguous & molecular weight = "+str(mol_weight))
                mol_weight_list.append(mol_weight)
                mol_weight_dict[str(record.id)] = mol_weight
    #print(mol_weight_dict)
    return mol_weight_dict

def MW_list(filename, min_MW, max_MW):
    mol_weight = unamb_MW(filename)
    for record in mol_weight:
        if min_MW < mol_weight[record]:
            if mol_weight[record] < max_MW:
                print('\n', [record])
            else:
                pass
        else:
            pass ´´´
#If you're taking the course computational biology then we have the same assignment.
票数 -1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/70351467

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档