首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Got 'TypeError:字符串索引必须是整数‘比较蛋白质序列

Got 'TypeError:字符串索引必须是整数‘比较蛋白质序列
EN

Stack Overflow用户
提问于 2019-03-25 05:24:39
回答 1查看 39关注 0票数 0

如何按字母顺序返回列表?

我有序列翻译器,和一个可以读取dna和蛋白质序列的python代码。代码读取dna序列并将其翻译为蛋白质序列,读取蛋白质序列,将其与翻译的蛋白质序列进行比较,并打印出存在于读取的蛋白质序列中的蛋白质序列的列表。我如何打印它们中都存在的蛋白质的列表?

代码语言:javascript
复制
def translate_codon(cod):
    """Translates a codon into an aminoacid using an internal dictionary with the standard genetic code."""
    tc = {"GCT":"A", "GCC":"A", "GCA":"A", "GCG":"A",
          "TGT":"C", "TGC":"C",
          "GAT":"D", "GAC":"D",
          "GAA":"E", "GAG":"E",
          "TTT":"F", "TTC":"F",
          "GGT":"G", "GGC":"G", "GGA":"G", "GGG":"G",
          "CAT":"H", "CAC":"H",
          "ATA":"I", "ATT":"I", "ATC":"I",
          "AAA":"K", "AAG":"K",
          "TTA":"L", "TTG":"L", "CTT":"L", "CTC":"L", "CTA":"L", "CTG":"L",
          "ATG":"M", "AAT":"N", "AAC":"N",
          "CCT":"P", "CCC":"P", "CCA":"P", "CCG":"P",
          "CAA":"Q", "CAG":"Q",
          "CGT":"R", "CGC":"R", "CGA":"R", "CGG":"R", "AGA":"R", "AGG":"R",
          "TCT":"S", "TCC":"S", "TCA":"S", "TCG":"S", "AGT":"S", "AGC":"S",
          "ACT":"T", "ACC":"T", "ACA":"T", "ACG":"T",
          "GTT":"V", "GTC":"V", "GTA":"V", "GTG":"V",
          "TGG":"W",
          "TAT":"Y", "TAC":"Y",
          "TAA":"_", "TAG":"_", "TGA":"_"}
    if cod in tc:
        return tc[cod]
    else:
        return '-1'


def seq_prot(dna_seq, ab):
    seqm = dna_seq.upper()
    prot = ab.upper()
    seq_aa = ''
    for pos in range(0, len(seqm)-2,3):
        cod = seqm[pos:pos+3]
        seq_aa += translate_codon(cod)
    for p in seq_aa:
        if p in prot:
            seq_aa[p] += 1
        else:
            seq_aa = p

    return seq_aa

dna_seq = "ACCCCTGTGACATACCTTTATGTTGCCTCGGCGGATCAGCCCGCGCCCC"
ab = 'TLYPAP'

print("The protein sequence are :",seq_prot(dna_seq, ab))

蛋白质序列为: TYPP

EN

回答 1

Stack Overflow用户

发布于 2019-03-25 10:03:32

您的代码被破坏了,因为它将seq_aa视为strdict

代码语言:javascript
复制
def seq_prot(dna_seq, ab):
    sequence = dna_seq.upper()
    protein = ab.upper()
    matches = {}

    for position in range(0, len(sequence), 3):
        codon = sequence[position: position + 3]
        aa = translate_codon(codon)

        if aa in protein:
            if aa in matches:
                matches[aa] += 1
            else:
                matches[aa] = 1

    return matches

dna_seq = "ACCCCTGTGACATACCTTTATGTTGCCTCGGCGGATCAGCCCGCGCCCC"
ab = 'TLYPAP'

print("The protein sequence matches are :", seq_prot(dna_seq, ab))

输出

代码语言:javascript
复制
The protein sequence matches are : {'T': 2, 'P': 3, 'Y': 2, 'L': 1, 'A': 3}

您可以通过在返回的dict上使用.keys()来从中提取蛋白质。如果要将字母与值相乘,可以使用乘法(*)作为重复运算符。然而,任何有条理的感觉都已经丢失了--我们只是在处理存在。如果你想维持秩序,我们必须采取不同的方式。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/55328730

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档