我正在尝试编写一个程序,它将计算一系列序列(以fasta格式输入)中的每个序列的GC含量,然后返回具有最高百分比的序列的名称及其GC百分比。根据this Rosalind problem.
我终于不再收到错误消息了,但是我的代码似乎什么也做不了。有人知道为什么会这样吗?
#Define functions
#Calculate GC percentage
def Percent(sequence):
G_count = sequence.count ('G')
C_count = sequence.count ('C')
Total_count = len(sequence)
GC_Sum = int(G_count) + int(C_count)
Percent_GC = GC_Sum / Total_count
Per_GC = (Percent_GC)*100
return Per_GC
Input = input ("Input Sequence")
#Fasta file into dictionary
fasta_dictionary = {}
sequence_name = ""
for line in Input:
line = line.strip()
if not line:
continue
if line.startswith(">"):
sequence_name = line[1:]
if sequence_name not in fasta_dictionary:
fasta_dictionary[sequence_name] = []
continue
sequence = line
fasta_dictionary[sequence_name].append(sequence)
#Put GC values for each sequence into dictionary
dictionary = dict()
for sequence_name in fasta_dictionary:
dictionary[sequence_name] = float(Percent(sequence))
#Find highest
for sequence_name, sequence in fasta_dictionary.items():
inverse = [(sequence, sequence_name) for sequence_name, sequence in dictionary.items()]
highest_GC = max(inverse)[1]
#Find sequence name
for sequence_name, sequence in fasta_dictionary.items():
if sequence == highest_GC:
print ((sequence_name) + ' ' + (highest_GC))发布于 2016-02-01 06:17:23
因此,Pier Paolo是正确的,将第一行更改为with open(),并将其下的其余代码缩进,如下所示。
with open('/path/to/your/fasta.fasta', 'r') as Input:
fasta_dictionary = {}他在除法上也是正确的--这应该有助于您的Percent函数。Percent_GC = float(GC_Sum) / Total_count
不需要追加,只需将sequence指定为字符串即可。
sequence = line
fasta_dictionary[sequence_name] = sequence您将它们存储在名为fasta_dictionary的字典中,因此请更改这段代码。
for sequence_name in fasta_dictionary:
dictionary[sequence_name] = float(Percent(fasta_dictionary[sequence_name]))最后,您将检查if sequence == highest_GC:。这是您当前正在检查的内容:
for sequence_name, sequence in fasta_dictionary.items():
print sequence打印实际序列数据的str。
'ATTGCGCTANANAGCTANANCGATAGANCACGATNGAGATAGACTATAGC'highest_GC是序列的“名称”
>sequence1将其更改为if sequence_name == highest_GC
运行具有上述更改的代码时,总是打印具有最高GC含量%的序列的名称。还有很多其他不必要的步骤和重复的代码,但希望这能让你入门。祝好运!
发布于 2016-06-10 19:59:18
GC问题的另一个解决方案是使用python中的计数器高阶数据结构。它可以自动为你设置和计算你的核苷酸,这样你就可以直接要求数字计算如下:
from collections import Counter
#set a var to hold your dna
myDna = ''
#open your Dna fasta
with open('myFasta', 'r') as data:
for line in data:
if '>' in line:
continue
myDna += line.strip()
#Now count your dna
myNucleotideCounts = Counter(myDna)
#calculate GC content
myGC = (myNucleotideCounts['G'] + myNucleotideCounts['C']) / float(len(myDna))
print('Dna GC Content = {0}'.format(myGC))https://stackoverflow.com/questions/35117978
复制相似问题