我试着写了下面的程序:
import numpy as np #import package for scientific computing
dna1 = str(np.load('dna1.npy'))
def count(dna1, repeat):
i = 0
for s in range(len(dna1)):
if (s =='repeat'):
i += 1
s += dna1[0:1]
return i
repeat = 'TTTT'
n = count(dna1, repeat)
print ('{repeat} occurs {n} times in dna1'.format(repeat=repeat, n=n))我想提取列表中的4个字母的每一个可能的组合,并检查它们是否等于'TTTT'。但是我不知道如何在我的列表中增加一个位置来使s移动,但我仍然读4个字母。
发布于 2016-10-22 18:09:31
我同意尝试使用regex可能是最简单的初始方法:
import numpy as np #import package for scientific computing
import re
dna1 = str(np.load('dna1.npy'))
def count(dna1, repeat):
regex = re.compile(repeat)
result = regex.findall(dna1)
return len(result)
repeat = 'TTTT'
n = count(dna1, repeat)
print ('{repeat} occurs {n} times in dna1'.format(repeat=repeat, n=n))编辑:
这里有一种不使用regex模块的简单方法--您当然可以根据每次迭代的结果进行一些优化,以跳过前面的步骤:
def count(dna1, repeat):
repeat_length = len(repeat)
total = 0
idx = 0
while idx < len(dna1):
substr = dna1[idx:idx+repeat_length]
if substr == repeat:
total += 1
idx += repeat_length # skip ahead to avoid repeat counting
else:
idx += 1
return total发布于 2016-10-22 18:24:15
最好和最可定制的方法如下:
import numpy as np # import package for scientific computing
dna1 = str(np.load('dna1.npy'))
repeat = 'TTTT'
def get_num_of_repeats(dna, repeat):
repeats = 0
for i in range(len(dna) - len(repeat) + 1):
if dna[i:i+len(repeat)] == repeat:
repeats += 1
return repeats
repeats = get_num_of_repeats(dna1, repeat)
print ('{repeat} occurs {n} times in dna1'.format(repeat=repeat, n=repeats))我只是创建了一个函数get_num_of_repeats,它请求dna变量和模式来监视并返回重复次数。根据你想要的算法的功能,当你在寻找像'TTTT'这样的模式时,事情可能会变得很困难,而dna的一部分有'TTTTT'。我可以给你后续的帮助来定义你想要的行为。
https://stackoverflow.com/questions/40195097
复制相似问题