文章/答案/技术大牛

发布

问Biopython:权重区间和字典
EN

Stack Overflow用户

提问于 2022-01-08 15:18:22

回答 1查看 47关注 0票数 -1

希望熟悉Biopython的人能帮我解决问题。我有一个函数，它接受FASTA文件(DNA序列文件)并创建一个字典，其中序列ID为键，序列的分子量为值。因为序列可以是歧义的，所以我也有一个函数，它从歧义序列中提取出所有可能的真实序列，并使用它作为字典的输入--我刚才描述了这个函数；我把它集成到字典中--创建函数，这样对于不明确的序列，函数就会产生一个最小和最大的相对分子质量值，这个值是由歧义序列表示的可能的真实序列。

def seq_ID_and_weight(file_name):
    with open (file_name) as file:
        ID_weight = {} #create an empty dictionary
        for sequence in SeqIO.parse(file,'fasta'):
            weight_min = 10000
            weight_max = 0
            all_poss_sequences = ambiguous_to_unambiguous(sequence.seq) # only call the function once and store it in variable to improve performance
            if len(all_poss_sequences) != 1: #if the length would be 1, its unambiguous
                for possib in all_poss_sequences:
                    if SeqUtils.molecular_weight(possib) < weight_min: 
                        weight_min = SeqUtils.molecular_weight(possib)
                    elif SeqUtils.molecular_weight(possib) > weight_max:
                        weight_max = SeqUtils.molecular_weight(possib)
                ID_weight[sequence.id] = [weight_min, weight_max]
            else:
                ID_weight[sequence.id] = [SeqUtils.molecular_weight(sequence.seq)]
        return ID_weight

函数输出类似这样的内容，其中的值要么是序列的最终分子量(如果seq是明确的)，要么是序列的可能分子量的最小和最大(如果seq是不明确的)：

{'seq_7009'：6236.9764,6367.049999999999，'seq_418'：3716.3642000000004,3796.4124000000006，'seq_9143_unamb'：4631.958999999999}

但是，现在我需要使用这个函数来创建一个新的函数，它可以做一些稍微不同的事情。新函数需要以FASTA文件名和最小和最大分子量作为输入，并返回在该区间内具有分子量的序列的序列ID列表。基本上，函数应该返回模糊序列的ID，其中权重间隔与您指定的权重间隔重叠。

我对此的处理办法如下：

初始化一个包含前一个函数输出的字典，就像我上面给出的例子一样。
迭代字典，检查键是只有一个值还是多个(元组)。

如果只有一个值，那么检查该值是否在给定的范围内，如果是，则打印该序列ID。如果没有，请中断(什么都不做)。

b.如果有多个值，那么检查第一个或第二个值是否在给定的范围内(因为如果是的话，就会有一些重叠)。如果是的话，打印那个序列ID。如果没有，就中断。

我将如何实现这一点呢？到目前为止，这就是我所拥有的全部--我实际上只创建了一本字典：

def find_sequence(file_name, min_weight, max_weight):
    with open (file_name) as file:
        dictionary = {}
        dictionary.update(seq_ID_and_weight(file_name))
        for key in dictionary:

现在我需要检查这些键有多少个值，但是我不知道如何去做。有什么想法吗？

python

dictionary

biopython

回答 1

Stack Overflow用户

发布于 2022-01-08 15:32:50

您只需遍历字典&检查前两个值。

这就是解决办法。

def find_sequence(file_name, min_weight, max_weight):
        li=[] # list to store ids

        with open (file_name) as file:
            dictionary = {}
            dictionary.update(seq_ID_and_weight(file_name))  
            for k,v in dictionary.items(): # traverse the dictionary
                for i in range(min(2,len(v))): # if len(v) > 2 , then it range will be 2 else 1
                    if v[i]>min_weight and v[i]<max_weight: # if value is within range append the sequence_id to list
                        li.append(k)
                        break

        return li

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/70633715

复制

相似问题

问Biopython:权重区间和字典
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Biopython:权重区间和字典EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Biopython:权重区间和字典
EN