文章/答案/技术大牛

发布

社区首页 >问答首页 >Rosalind配置文件和共识:用Python将长字符串写入一行(格式化)

问Rosalind配置文件和共识:用Python将长字符串写入一行(格式化)
EN

Stack Overflow用户

提问于 2016-08-06 15:16:58

回答 1查看 410关注 0票数 0

我试图解决Rosalind上的一个问题，给出一个FASTA文件，该文件最多包含10条1kb的序列，我需要给出一致的序列和概况(每个碱基中有多少个碱基在每个核苷酸上有共同之处)。在格式化响应的上下文中，我所拥有的代码适用于小序列(已验证)。

然而，当涉及到大序列时，我在格式化我的响应方面有问题。不管时间长短，我希望返回的是：

"consensus sequence"
"A: one line string of numbers without commas"
"C: one line string """" "
"G: one line string """" "
"T: one line string """" "

它们彼此对齐，并在各自的行上对齐，或者至少有一些格式设置允许我将这种格式作为一个单元继续进行，以保持对齐的完整性。

但是，当我为一个大序列运行我的代码时，我会在共识序列下面得到每个单独的字符串，每个字符串都被一个换行符分开，这大概是因为字符串本身太长了。我一直在努力想办法绕过这个问题，但我的搜索一直没有结果。我正在考虑一些迭代编写算法，它只需编写上述期望的全部内容，但如果能提供任何帮助，我们将不胜感激。为了完整起见，我在下面附上了我的全部代码，并根据需要提供了块注释，不过主要部分是这样的。

def cons(file):
#returns consensus sequence and profile of a FASTA file
    import os
    path = os.path.abspath(os.path.expanduser(file))

    with open(path,"r") as D:
        F=D.readlines()

#initialize list of sequences, list of all strings, and a temporary storage
#list, respectively
    SEQS=[]
    mystrings=[]
    temp_seq=[]

#get a list of strings from the file, stripping the newline character
    for x in F:
        mystrings.append(x.strip("\n"))

#if the string in question is a nucleotide sequence (without ">")
#i'll store that string into a temporary variable until I run into a string
#with a ">", in which case I'll join all the strings in my temporary
#sequence list and append to my list of sequences SEQS    
    for i in range(1,len(mystrings)):
        if ">" not in mystrings[i]:
            temp_seq.append(mystrings[i])
        else:
            SEQS.append(("").join(temp_seq))
            temp_seq=[]
    SEQS.append(("").join(temp_seq))

#set up list of nucleotide counts for A,C,G and T, in that order
    ACGT=      [[0 for i in range(0,len(SEQS[0]))],
                [0 for i in range(0,len(SEQS[0]))],
                [0 for i in range(0,len(SEQS[0]))],
                [0 for i in range(0,len(SEQS[0]))]]

#assumed to be equal length sequences. Counting amount of shared nucleotides
#in each column
    for i in range(0,len(SEQS[0])-1):
        for j in range(0, len(SEQS)):
            if SEQS[j][i]=="A":
                ACGT[0][i]+=1
            elif SEQS[j][i]=="C":
                ACGT[1][i]+=1
            elif SEQS[j][i]=="G":
                ACGT[2][i]+=1
            elif SEQS[j][i]=="T":
                ACGT[3][i]+=1

    ancstr=""
    TR_ACGT=list(zip(*ACGT))
    acgt=["A: ","C: ","G: ","T: "]
    for i in range(0,len(TR_ACGT)-1):
        comp=TR_ACGT[i]
        if comp.index(max(comp))==0:
            ancstr+=("A")
        elif comp.index(max(comp))==1:
            ancstr+=("C")
        elif comp.index(max(comp))==2:
            ancstr+=("G")
        elif comp.index(max(comp))==3:
            ancstr+=("T")

'''
writing to file... trying to get it to write as
consensus sequence
A: blah(1line)
C: blah(1line)
G: blah(1line)
T: blah(line)
which works for small sequences. but for larger sequences
python keeps adding newlines if the string in question is very long...
'''


    myfile="myconsensus.txt"
    writing_strings=[acgt[i]+' '.join(str(n) for n in ACGT[i] for i in      range(0,len(ACGT))) for i in range(0,len(acgt))]
    with open(myfile,'w') as D:
        D.writelines(ancstr)
        D.writelines("\n")
        for i in range(0,len(writing_strings)):
            D.writelines(writing_strings[i])
            D.writelines("\n")

反对意见(“rosalind_cons.txt”)

python

python-3.x

rosalind

回答 1

Stack Overflow用户

回答已采纳

发布于 2016-08-06 16:45:25

除了这一行代码外，您的代码完全没有问题：

writing_strings=[acgt[i]+' '.join(str(n) for n in ACGT[i] for i in      range(0,len(ACGT))) for i in range(0,len(acgt))]

你不小心复制了你的数据。尝试将其替换为：

writing_strings=[ACGT[i] + str(ACGT[i]) for i in range(0,len(ACGT))]

然后将其写入输出文件，如下所示：

D.write(writing_strings[i][1:-1])

这是消除列表中括号的一种懒散方法。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/38805770

复制

相似问题

问Rosalind配置文件和共识:用Python将长字符串写入一行(格式化)
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Rosalind配置文件和共识:用Python将长字符串写入一行(格式化)EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Rosalind配置文件和共识:用Python将长字符串写入一行(格式化)
EN