首页
学习
活动
专区
圈层
工具
发布

GC内容un
EN

Stack Overflow用户
提问于 2021-10-10 22:50:53
回答 1查看 79关注 0票数 1

我有这个程序来生成随机N序列,并找到GC内容。

代码语言:javascript
复制
import random

def randseq(abc, length):
    return "".join([random.choice(abc) for i in range(random.randint(1, length))])
N = 2
longest_seq = ""
shortest_seq = randseq("ATCG", 10)
for i in range(N):
    print(f'Sequence {i +1}):')
    seq = randseq("ATCG", 10)
    if len(seq) > len(longest_seq):
        longest_seq = seq
    if len(seq) < len(shortest_seq):
        shortest_seq = seq
    totalG = seq.count("G")
    totalC = seq.count("C")
    GCcontent = totalG + totalC
    print(seq)

print("The GC content is:", GCcontent)

这是输出:

(第1顺序):

TCGGTG

(第2顺序):

GCATCGTCAA

气相色谱含量为:5

代码语言:javascript
复制
When I print the GC content, it does not make sense. The content should be: Cs = 4 + Gs = 5, Total = 9. What's wrong with the code? Also how can I show the result of sequences
in a dictionary? for example: Sequence 1: {A:0, T:2, C:1, G:3} 
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-10-10 23:12:05

代码更正和计数输出的要求。

代码语言:javascript
复制
import random
from collections import Counter

def randseq(abc, length):
    return "".join([random.choice(abc) for i in range(random.randint(1, length))])
N = 2

GCcontent = 0
sequences = []
for i in range(N):
    print(f'Sequence {i +1}):')
    seq = randseq("ATCG", 10)
    sequences.append(seq)
    
    totalG = seq.count("G")
    totalC = seq.count("C")
    GCcontent += totalG + totalC
    print(f'\tSequence: {seq}')
    print(f'\tCounts: {Counter(seq)}')
    print()
    

shortest_seq = min(sequences, key = len)
longest_seq = max(sequences, key = len)
print(f"The GC content is: {GCcontent}")
print(f"Longest sequence is sequence number: {sequences.index(longest_seq) + 1}")
print(f"Shortest sequence is sequence number: {sequences.index(shortest_seq) + 1}")

示例运行

代码语言:javascript
复制
Sequence 1):
    Sequence: GCAGATAGC
    Counts: Counter({'G': 3, 'A': 3, 'C': 2, 'T': 1})

Sequence 2):
    Sequence: ACT
    Counts: Counter({'A': 1, 'C': 1, 'T': 1})

The GC content is: 6
Longest sequence is sequence number: 1
Shortest sequence is sequence number: 2

代码重构

上面的代码可以重写得更简洁,如下所示。

代码语言:javascript
复制
import random
from collections import Counter

def randseq(abc, length):
    return "".join([random.choice(abc) for i in range(random.randint(1, length))])

N = 2
sequences = [randseq("ATCG", 10) for _ in range(N)]   # N sequences
counts = [Counter(seq) for seq in sequences]          # count of letters of all sequences

for i, seq in enumerate(sequences, start = 1):
    print(f'\nSequence {i}):\n\tSequence: {seq}\n\tCounts: {Counter(seq)}')
 
    
shortest_seq = min(sequences, key = len)
longest_seq = max(sequences, key = len)
GCcontent = sum(cnt['G'] + cnt['C'] for cnt in counts)
print(f"\nThe GC content is: {GCcontent}")
print(f"Longest sequence is sequence number: {sequences.index(longest_seq) + 1}")
print(f"Shortest sequence is sequence number: {sequences.index(shortest_seq) + 1}")
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/69519543

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档