首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Python GC计数器- Rosalind

Python GC计数器- Rosalind
EN

Stack Overflow用户
提问于 2016-02-01 02:58:43
回答 2查看 2.1K关注 0票数 3

我正在尝试编写一个程序,它将计算一系列序列(以fasta格式输入)中的每个序列的GC含量,然后返回具有最高百分比的序列的名称及其GC百分比。根据this Rosalind problem.

我终于不再收到错误消息了,但是我的代码似乎什么也做不了。有人知道为什么会这样吗?

代码语言:javascript
复制
#Define functions
#Calculate GC percentage 
def Percent(sequence):
G_count = sequence.count ('G')
C_count = sequence.count ('C')
Total_count = len(sequence)
GC_Sum = int(G_count) + int(C_count)
Percent_GC = GC_Sum / Total_count
Per_GC = (Percent_GC)*100
return Per_GC

Input = input ("Input Sequence")

#Fasta file into dictionary
fasta_dictionary = {}
sequence_name = ""
for line in Input:
    line = line.strip()
    if not line:
        continue
    if line.startswith(">"):
        sequence_name = line[1:]
        if sequence_name not in fasta_dictionary:
            fasta_dictionary[sequence_name] = []
        continue
    sequence = line
    fasta_dictionary[sequence_name].append(sequence)

#Put GC values for each sequence into dictionary
dictionary = dict()
for sequence_name in fasta_dictionary:
dictionary[sequence_name] = float(Percent(sequence))

#Find highest
for sequence_name, sequence in fasta_dictionary.items():
    inverse = [(sequence, sequence_name) for sequence_name, sequence in dictionary.items()]
    highest_GC = max(inverse)[1]  

#Find sequence name
for sequence_name, sequence in fasta_dictionary.items():
        if sequence == highest_GC:
            print ((sequence_name) + ' ' + (highest_GC))
EN

回答 2

Stack Overflow用户

发布于 2016-02-01 06:17:23

因此,Pier Paolo是正确的,将第一行更改为with open(),并将其下的其余代码缩进,如下所示。

代码语言:javascript
复制
with open('/path/to/your/fasta.fasta', 'r') as Input:
   fasta_dictionary = {}

他在除法上也是正确的--这应该有助于您的Percent函数。Percent_GC = float(GC_Sum) / Total_count

不需要追加,只需将sequence指定为字符串即可。

代码语言:javascript
复制
sequence = line
fasta_dictionary[sequence_name] = sequence

您将它们存储在名为fasta_dictionary的字典中,因此请更改这段代码。

代码语言:javascript
复制
for sequence_name in fasta_dictionary:
        dictionary[sequence_name] = float(Percent(fasta_dictionary[sequence_name]))

最后,您将检查if sequence == highest_GC:。这是您当前正在检查的内容:

代码语言:javascript
复制
for sequence_name, sequence in fasta_dictionary.items():
            print sequence

打印实际序列数据的str

代码语言:javascript
复制
'ATTGCGCTANANAGCTANANCGATAGANCACGATNGAGATAGACTATAGC'

highest_GC是序列的“名称”

代码语言:javascript
复制
>sequence1

将其更改为if sequence_name == highest_GC

运行具有上述更改的代码时,总是打印具有最高GC含量%的序列的名称。还有很多其他不必要的步骤和重复的代码,但希望这能让你入门。祝好运!

票数 1
EN

Stack Overflow用户

发布于 2016-06-10 19:59:18

GC问题的另一个解决方案是使用python中的计数器高阶数据结构。它可以自动为你设置和计算你的核苷酸,这样你就可以直接要求数字计算如下:

代码语言:javascript
复制
from collections import Counter

#set a var to hold your dna
myDna = ''
#open your Dna fasta
with open('myFasta', 'r') as data:
     for line in data:
          if '>' in line:
               continue
          myDna += line.strip()

#Now count your dna
myNucleotideCounts = Counter(myDna)

#calculate GC content
myGC = (myNucleotideCounts['G'] + myNucleotideCounts['C']) / float(len(myDna))

print('Dna GC Content = {0}'.format(myGC))
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/35117978

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档