我有这个程序来生成随机N序列,并找到GC内容。
import random
def randseq(abc, length):
return "".join([random.choice(abc) for i in range(random.randint(1, length))])
N = 2
longest_seq = ""
shortest_seq = randseq("ATCG", 10)
for i in range(N):
print(f'Sequence {i +1}):')
seq = randseq("ATCG", 10)
if len(seq) > len(longest_seq):
longest_seq = seq
if len(seq) < len(shortest_seq):
shortest_seq = seq
totalG = seq.count("G")
totalC = seq.count("C")
GCcontent = totalG + totalC
print(seq)
print("The GC content is:", GCcontent)这是输出:
(第1顺序):
TCGGTG
(第2顺序):
GCATCGTCAA
气相色谱含量为:5
When I print the GC content, it does not make sense. The content should be: Cs = 4 + Gs = 5, Total = 9. What's wrong with the code? Also how can I show the result of sequences
in a dictionary? for example: Sequence 1: {A:0, T:2, C:1, G:3} 发布于 2021-10-10 23:12:05
代码更正和计数输出的要求。
import random
from collections import Counter
def randseq(abc, length):
return "".join([random.choice(abc) for i in range(random.randint(1, length))])
N = 2
GCcontent = 0
sequences = []
for i in range(N):
print(f'Sequence {i +1}):')
seq = randseq("ATCG", 10)
sequences.append(seq)
totalG = seq.count("G")
totalC = seq.count("C")
GCcontent += totalG + totalC
print(f'\tSequence: {seq}')
print(f'\tCounts: {Counter(seq)}')
print()
shortest_seq = min(sequences, key = len)
longest_seq = max(sequences, key = len)
print(f"The GC content is: {GCcontent}")
print(f"Longest sequence is sequence number: {sequences.index(longest_seq) + 1}")
print(f"Shortest sequence is sequence number: {sequences.index(shortest_seq) + 1}")示例运行
Sequence 1):
Sequence: GCAGATAGC
Counts: Counter({'G': 3, 'A': 3, 'C': 2, 'T': 1})
Sequence 2):
Sequence: ACT
Counts: Counter({'A': 1, 'C': 1, 'T': 1})
The GC content is: 6
Longest sequence is sequence number: 1
Shortest sequence is sequence number: 2代码重构
上面的代码可以重写得更简洁,如下所示。
import random
from collections import Counter
def randseq(abc, length):
return "".join([random.choice(abc) for i in range(random.randint(1, length))])
N = 2
sequences = [randseq("ATCG", 10) for _ in range(N)] # N sequences
counts = [Counter(seq) for seq in sequences] # count of letters of all sequences
for i, seq in enumerate(sequences, start = 1):
print(f'\nSequence {i}):\n\tSequence: {seq}\n\tCounts: {Counter(seq)}')
shortest_seq = min(sequences, key = len)
longest_seq = max(sequences, key = len)
GCcontent = sum(cnt['G'] + cnt['C'] for cnt in counts)
print(f"\nThe GC content is: {GCcontent}")
print(f"Longest sequence is sequence number: {sequences.index(longest_seq) + 1}")
print(f"Shortest sequence is sequence number: {sequences.index(shortest_seq) + 1}")https://stackoverflow.com/questions/69519543
复制相似问题