我正在写一个程序。我的蟒蛇经验有限。该程序采用csv文件来创建列表。txt文件接受一串字符,并将其转换为into列表。然后,我必须将列表与列表中的in进行比较。这是比较DNA的匹配。这个程序只适用于一个问题。我的csv文件中有三个不同的DNA字符串要比较,另一个文件有8条,我可以让它们一起工作,但是它需要对行进行调整,以创建列表、chdnalist和子序列。当前使用[[]、[]、[]]运行,它将只运行带有3个字符串的较小的列表。我可以总共添加8 [],但是当然,它不能处理3字符串的集合。
下面是我的代码副本:
import csv
import sys
def main():
# TODO: Check for command-line usage
if len(sys.argv) != 3:
print("Missing Files!")
sys.exit(1)
# TODO: Read database file into a variable
dnalist = []
with open(sys.argv[1], "r", newline = '') as subject_file:
dnalst = csv.reader(subject_file)
for row_list in dnalst:
dnalist.append(row_list)
# TODO: Read DNA sequence file into a variable
with open(sys.argv[2], "r") as f:
sequence = f.read()
# TODO: Find longest match of each STR in DNA sequence
chdnalist = [[],[],[]]
subsequence = [[],[],[]]
for q in range(1, len(dnalist[0])):
subsequence[q - 1] = dnalist[0][q]
for t in range (len(subsequence)):
chdnalist[t] = longest_match(sequence, subsequence[t])
# TODO: Check database for matching profiles
compare(dnalist, chdnalist)
return
def longest_match(sequence, subsequence):
"""Returns length of longest run of subsequence in sequence."""
# Initialize variables
longest_run = 0
subsequence_length = len(subsequence)
sequence_length = len(sequence)
# Check each character in sequence for most consecutive runs of subsequence
for i in range(sequence_length - 1):
# Initialize count of consecutive runs
count = 0
# Check for a subsequence match in a "substring" (a subset of characters) within sequence
# If a match, move substring to next potential match in sequence
# Continue moving substring and checking for matches until out of consecutive matches
while True:
# Adjust substring start and end
start = i + count * subsequence_length
end = start + subsequence_length
# If there is a match in the substring
if sequence[start:end] == subsequence:
count += 1
# If there is no match in the substring
else:
break
# Update most consecutive matches found
longest_run = max(longest_run, count)
# After checking for runs at each character in seqeuence, return longest run found
return longest_run
def compare(dnalist, chdnalist):
for name, *dna in dnalist[1:]:
dna = list(map(int, dna))
if dna == chdnalist:
print(name)
return
print("No match")
main()下面是一个小型csv文件的副本:
name,AGATC,AATG,TATC
Alice,2,8,3
Bob,4,1,5
Charlie,3,2,5下面是一个大型csv文件的副本:
name,AGATC,TTTTTTCT,AATG,TCTAG,GATA,TATC,GAAA,TCTG
Albus,15,49,38,5,14,44,14,12
Cedric,31,21,41,28,30,9,36,44
Draco,9,13,8,26,15,25,41,39
Fred,37,40,10,6,5,10,28,8
Ginny,37,47,10,23,5,48,28,23
Hagrid,25,38,45,49,39,18,42,30
Harry,46,49,48,29,15,5,28,40
Hermione,43,31,18,25,26,47,31,36
James,46,41,38,29,15,5,48,22
Kingsley,7,11,18,33,39,31,23,14
Lavender,22,33,43,12,26,18,47,41
Lily,42,47,48,18,35,46,48,50
Lucius,9,13,33,26,45,11,36,39
Luna,18,23,35,13,11,19,14,24
Minerva,17,49,18,7,6,18,17,30
Neville,14,44,28,27,19,7,25,20
Petunia,29,29,40,31,45,20,40,35
Remus,6,18,5,42,39,28,44,22
Ron,37,47,13,25,17,6,13,35
Severus,29,27,32,41,6,27,8,34
Sirius,31,11,28,26,35,19,33,6
Vernon,26,45,34,50,44,30,32,28
Zacharias,29,50,18,23,38,24,22,9这里是一个文本文件的副本,当我在一个小csv文件中运行时,程序将返回一个匹配给Bob。
AAGGTAAGTTTAGAATATAAAAGGTGAGTTAAATAGAATAGGTTAAAATTAAAGGAGATCAGATCAGATCAGATCTATCTATCTATCTATCTATCAGAAAAGAGTAAATAGTTAAAGAGTAAGATATTGAATTAATGGAAAATATTGTTGGGGAAAGGAGGGATAGAAGG此文件将从大型csv文件返回与露娜的匹配:
/*
* 提示:该行代码过长,系统自动注释不进行高亮。一键复制会移除系统注释
* TCTATTCTTTGAGGATACGCTCGGCCTAGGCGGGGCTAATGGAAGCCAGGCTAATCCGATGTTGCGGTGCACCTCGATACCGTTCTAAAATATCACATCAACGCGCTCCAGTTGTGTGCCAAGGCCCGCTGAAGAGCAATGGAGCACCTACCCGGCCTTCTAACGCTGTCTAAAACTCCAAGCGAATTGCAGATTTTGGTTAGGACCCGTTTAATCTGTGGGCTTTGGTACTATGCAACCAATGGAACCGGTCGGACTCTGATCAGTCCCGACTGACAGGTCTCAAGTAGTTTGCTTACACGTTCTGACCCCCGTGCGCACCGTTGGGCGTACAGCGGTTCGGTCTATGGAATCAAGGAAAATCATTCGTATGGGGACGTAGTCACATAACAGCTGCAGGGAACTATGGAGATGACGAGGGGTCGTTTAGTGGAACGTCAAATGTCCTAACTGGTTCTGAGCTGTCTGGAACGTTGCAGTCAACGTCTACGATCTGGATTCTACAGTCTAGGCGTTCCAAGGGGCACCAGTAAGCTAAGTTGTTTAAATATGGCGGGTGTCGAAATGACGTCCAAAATCGCAAATAAGACAGATAGCAGGGGTGCAACTTAGGTATCTAAGGTAACTCTGACATACCTCATACAACTATCGAACAGTGGATTCCTTGTCGTCCTGTTGTAAACAGTTCAAGTCGGTACATGTTAGCGGGTGGTTTGGACGAGTATACAGGACCTGGCCTACACGGAATGTTTTAGATTCTATGTCCGGCGGGGACATCGCGTGCCGCTAGGATATAATTGGATTGTGGGAAGAATTTGGCCGGATTTTTGGCCTAGACTCGCGCTTCAGACCATACCGTGCGATCAGCACGATTGCTGACAAGCGTCGGTATTAAAGCAGGCTCCTTCCCAGCCAAACTAACCCAACGAAGACATCATGTTTCGCCGAAGTATCTTTGGGAGATGGGCGAATTAATCGCTTAGCGTGGCCGACTTGGGGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGGTTTAAGGGACTTATCCGACCAGAGGGGCAGTTACTTGTGGCGGTCACACGCCAGGACGAGTCTGTTCTTGCTGTGCGTAGATTAGGCTTGATCTGTGACTACAGGCGAATAGTAGGTGTGGGAAACAGAGGGGGGAGCAATGTGATCCCGGGGGGAGTGCTTCCTATACCTCGGTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGGATTATCCCCACCCAATGATCCGATCGCAAGCCTTAATACCATGGCACACCTCTCAACCTACTGATCTTCCATCCGTTTAACCCAGCACTAAGCTGCTCAGTGGTCACACTATGTTCAAGCTTCCGTGACGTGGGATCCTGGGGTCTTCGCAAGGCTAGTTTTGACCATTATCGACGACCGTCACCCTGTGACTGGTTCCCAACAAGGTGTCAAGTTCTAGCCCGTACCTGCAATCGGGAACCTCCGGTGCTTCATGAACCATGGATATAGGAATTATTGGTCTCCTCTCGCGTAGGTAGCGCGAATACCCCCAAGATGACACACTGTGGTGAACTTTGAGGACTCCCAGAAGGGTGACGGGTTATGTGGTTACGCGAAGTCGGCGTATCCACCGCCTAATTTTAAATTCAGCTCGAGCGACACGCGCGCTTCCTGGAAACGTTAGACGGGAAAAACCCCGCCCGAGAATGCGGGTTCCGCGGCCCACTAGGGGGCCCCCCAAGGATCTGACCGCGTATAAGCAATGCACAGCTGTACCATTTCAAATAGGACAGATAGTACCCCCACCGTGACTCGGCCTCAGATAATGGAATACGACCTGGTGACGGCGGTAGGGGTTCTATCTCAGGTATTCAGAGGGTGCATCCAGGTGATTCGTCACGTCCCGATTTCGACCCCACCACAGGATTTGTGCGATGGTAGTCTTGATGCTGTTTGCAGGCGGCCAAGCATCTAGGAGATGCCTCACTGCGCGAGATGAACCGGCGTTTCACAAGGGGACGCCAGGCCTTGCCGTCTCCATAAACCACGAGAAGGTATCGAACGTCAAACGGATAAATGCCGCGATACCGCTCGTTTCGAAGCGGCACTTCGATGGAAATGAGTAGTATGGCCTCGCCACACGACTACTCATCGGCTTGCGCTGACATCAATCCTGGCTGGCTTGAGGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGCTCCATAGGAAGGTGCGGGATAGCGGACAGCTAATCGGACAGAAGGGCCAGCTTGCACTCTCCTATAATTAGCAAGCGCCATACAATTGTAATCACGTATAAAATACAGCTACGTAAGTAATAGAGAGGCTCCCGGACTGTCCGGCGTCCCGCCAGTCTCGTACCAGGAGGTGGGATGGTAGGCAAACGAGCCTACTAGAATTGGGCCACCCTGTGAATAATATGCAGAGGCAACTACAGACGTCCGTCACCTGCCTAGAATCGAGTTCATTGACGGTGGGATATGCTCCGTTACCTGACTGTAGTTCGACTTTGTGGTGCGCACATAACGAGTGTCTACGATGCACAAAGTGTGAGCAAATTAGGAGTGTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTGCCGAGATGTTGGCGGGAAGTGTACGGCTTTGCGTCGTCGAGTGCTACGCAGTGTGCTACACTCCCGCAGCTGAGGCTAGGGCCCGAAACTAGACATTTTTTCTTTTGGCACTTCGTTCCGTATAATGAGTTCCCTCAATTCCCCGTCCGCAAGCCTCAGGATTACAATTAATTATACGGTTAAAGTTGGCTGCCAAGCCCGTTATTGACCGGTACCTGAGTCGAGGGGGGGTTGGGGATAGGCAATTATAGTATTCACTCACAGGACGCTCAGTAATGCCGCCGTTGTACTTCACGTAAGGGCCACAGTTTTTCTACCACAGAGGATGATCTGAGGACAGCGGTGCGTGAAGCCCGCTATTCAGGACACCCTCGAAACCCGTGGTTCACAGACAAAAAATTCGCCGCGGAAGCTGTTGCCCCTATGCCCCGGGTCAGCAAGGAGTCTGGATTTTATTCCAAGACTGCGTCTTTATTTTCTGGTGAGTATGAAATGACTCTGAGAAAATGGTCGAACCACGAGCTAGCTACAGCCACAGTCCGCTCAACTAACTTACCTCTACTCTAACAGTTACACGGCTTCCCGTTTTATGGGAAGAAGCACCTGTTCCTTTCCCAAGCCCCTTATAGCAGAGGTTGGTATTCGGTTGATTTGGAATAGTTAAACAGCGGCTATTTTGTAATCACTTTCCAGTCGGTAAGACATTCGAACCTCGTTTTGACGCTGCTCGCCATCGCGTTCGACTAGGAGTATTCCACTTTTCGGAGAGATGATTACTCATGACGCGGGGAACTCCATGGCTGTCATGCAGGATCTGGGCTAAATAAGATTAGATGTTCAACTGTCGTATACTTACTGCTACCAGCGGTGCTAGGCCCAGGACCCGCCATACCTGGCTATTGATCACTCTACCAGATGTCTCTTGACGAGTTACGAATTGCTGGGTGCTCTTGGAGACGAGTTGAGTCCGTAGTCGTGGCTGGGGAACGGGCGAGTTCGTACGTACCGTTTCAAAGCCCCACGAACCCAACCTCTTAGCCTTAACCCCACATTAGATACCCAAGTTGCATGACGCATTATGCGAGTACGACACTGGTATCGGCTGATCCGTCACTGCTCAAAGTCCAGTGGTTTCCTTATCTCGGGCTGGAAAGTGTAGCTTGTTCCAAACCTTCGAGAGGTTGATCGATGACCGGTTCTCACACACATCTTGCGGAGGGATGCTTGCGATGTGGCTTTACGTCCACCGACGGGCCGACTAGCTGGAAATCACAAACCCCTGCTCCGATAAGGTATTCTCGTTGACTTAGGGTAAACAAAATGCCCGTTACGTCCTAACCGAGTTTCCGGGCCTTCACTACCCGCGAGGGATGTGTAGTGGGGCCATTTACCTAAGCAGATGTACACCGAGTTACGATAGTCACATGGCCATTCAAAGCGTCTCACATAATCGATCGATAGATGATGTCTAGTCTAGTCTAGTCTAGTCTAGTCTAGTCTAGTCTAGTCTAGTCTAGTCTAGTCTAGTCTAGAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCCCGGAGGAAGCTGCGATTGGAATGCGGCTAACTTCGCTCTGCAACATTCTTGGCAGACGGCCCCAATGGCGTAATTTAGGCGTGTGTACCTAAAGTGGTCTACTCCTATGAACCGAATCGCGGGATAAATCGAGTTGGGACTGCTTTGCCTTAATTACATTCACTGATTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGGTCGAGCACGGCTGGCACGTCCGGCTCCATCGCGTCGTAATCCATCCCTATTCGACCAACAAAACCTCAGGGGACGGGATGTGAGTGGGTATCGATCATTATCGAACGCCCATAAGTACTCCACTCATCTGTCTGAAAAGTTTGTCGAGTGCCGCTCTCTGAAGAGTACGATAACTTACTCCAAACACTCTACGCCTAGTGGTCGAAAACACTAAAGGGAAAATACTCACTGACTTACTCTGTCGCTCTACGATTGCCGCGATACCTTAATAAGACACGTATCGGCTGTCGCAGCGATGGATTCCTTAAGCGATACAACTAAGATCAATCGGTGCCGGGCCTACAGCCTGGGCCCTAGCTCCAAAAGTGATAATGGATAGTCGGTTCAAGCGAATTTACACCAGACTGATCCTTTACGGTCATTCCGACCGCCGCATGATACATGCCAAAAGACACTTGTCTTCTTTCCTCTAAAAGACAGACCTTGTTTGCAAGGAGAGCCCAATCGGCACGACCCAAAGGGATTATCAACTGAACTATTATTGCATACTACTAAGCAGACGGACCGTATAGCATCATTGATACCTATTATATTTCCATACACCAACTCCATACGCGATGGGTCGAAACTACAAGCTTCACTTACGTGTACAGCCGCAGGACCCACTCTCTAATCTAGCCAATGACACTACTAATTTGAACATTCCCCAGCGATGAACAGGCACATGAGCGGTCCTCGTACCCACCACGGCCCGCTCAACTGCAAGGGGCCGCTCGGATCAAAGTTTTTCACTAACTCATGTCGAGCAGATCGGCATGCTCAAGATAGTATTTTAGGAGG
*/发布于 2022-08-06 15:12:00
没有必要根据索引初始化空列表和迭代,例如清单理解将提供一种更简洁、更少错误的方法来实现相同的结果。
subsequence和chdnalist的构造可以这样固定:
# TODO: Find longest match of each STR in DNA sequence
subsequence = [x for x in dnalist[0][1:]]
chdnalist = [longest_match(sequence, s) for s in subsequence]https://stackoverflow.com/questions/73260760
复制相似问题