文章/答案/技术大牛

发布

社区首页 >问答首页 >Python GC计数器- Rosalind

问Python GC计数器- Rosalind
EN

Stack Overflow用户

提问于 2016-02-01 02:58:43

回答 2查看 2.1K关注 0票数 3

我正在尝试编写一个程序，它将计算一系列序列(以fasta格式输入)中的每个序列的GC含量，然后返回具有最高百分比的序列的名称及其GC百分比。根据this Rosalind problem.

我终于不再收到错误消息了，但是我的代码似乎什么也做不了。有人知道为什么会这样吗？

#Define functions
#Calculate GC percentage 
def Percent(sequence):
G_count = sequence.count ('G')
C_count = sequence.count ('C')
Total_count = len(sequence)
GC_Sum = int(G_count) + int(C_count)
Percent_GC = GC_Sum / Total_count
Per_GC = (Percent_GC)*100
return Per_GC

Input = input ("Input Sequence")

#Fasta file into dictionary
fasta_dictionary = {}
sequence_name = ""
for line in Input:
    line = line.strip()
    if not line:
        continue
    if line.startswith(">"):
        sequence_name = line[1:]
        if sequence_name not in fasta_dictionary:
            fasta_dictionary[sequence_name] = []
        continue
    sequence = line
    fasta_dictionary[sequence_name].append(sequence)

#Put GC values for each sequence into dictionary
dictionary = dict()
for sequence_name in fasta_dictionary:
dictionary[sequence_name] = float(Percent(sequence))

#Find highest
for sequence_name, sequence in fasta_dictionary.items():
    inverse = [(sequence, sequence_name) for sequence_name, sequence in dictionary.items()]
    highest_GC = max(inverse)[1]  

#Find sequence name
for sequence_name, sequence in fasta_dictionary.items():
        if sequence == highest_GC:
            print ((sequence_name) + ' ' + (highest_GC))

dna-sequence

python

bioinformatics

回答 2

Stack Overflow用户

发布于 2016-02-01 06:17:23

因此，Pier Paolo是正确的，将第一行更改为with open()，并将其下的其余代码缩进，如下所示。

with open('/path/to/your/fasta.fasta', 'r') as Input:
   fasta_dictionary = {}

他在除法上也是正确的--这应该有助于您的Percent函数。Percent_GC = float(GC_Sum) / Total_count

不需要追加，只需将sequence指定为字符串即可。

sequence = line
fasta_dictionary[sequence_name] = sequence

您将它们存储在名为fasta_dictionary的字典中，因此请更改这段代码。

for sequence_name in fasta_dictionary:
        dictionary[sequence_name] = float(Percent(fasta_dictionary[sequence_name]))

最后，您将检查if sequence == highest_GC:。这是您当前正在检查的内容：

for sequence_name, sequence in fasta_dictionary.items():
            print sequence

打印实际序列数据的str。

'ATTGCGCTANANAGCTANANCGATAGANCACGATNGAGATAGACTATAGC'

highest_GC是序列的“名称”

>sequence1

将其更改为if sequence_name == highest_GC

运行具有上述更改的代码时，总是打印具有最高GC含量%的序列的名称。还有很多其他不必要的步骤和重复的代码，但希望这能让你入门。祝好运!

票数 1

Stack Overflow用户

发布于 2016-06-10 19:59:18

GC问题的另一个解决方案是使用python中的计数器高阶数据结构。它可以自动为你设置和计算你的核苷酸，这样你就可以直接要求数字计算如下：

from collections import Counter

#set a var to hold your dna
myDna = ''
#open your Dna fasta
with open('myFasta', 'r') as data:
     for line in data:
          if '>' in line:
               continue
          myDna += line.strip()

#Now count your dna
myNucleotideCounts = Counter(myDna)

#calculate GC content
myGC = (myNucleotideCounts['G'] + myNucleotideCounts['C']) / float(len(myDna))

print('Dna GC Content = {0}'.format(myGC))

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/35117978

复制

相似问题

问Python GC计数器- Rosalind
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python GC计数器- RosalindEN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python GC计数器- Rosalind
EN