我在一个文件中有一系列字符串,格式如下:
>HEADER_Text1
Information here, yada yada yada
Some more information here, yada yada yada
Even some more information here, yada yada yada
>HEADER_Text2
Information here, yada yada yada
Some more information here, yada yada yada
Even some more information here, yada yada yada
>HEADER_Text3
Information here, yada yada yada
Some more information here, yada yada yada
Even some more information here, yada yada yada我正在尝试寻找一个正则表达式模式,它将删除下一个>字符之间的>字符下面的换行符。因此,最终结果将如下所示:
>HEADER_Text1
Information here, yada yada yada Some more information here, yada yada yada Even some more information here, yada yada yada
>HEADER_Text2
Information here, yada yada yada Some more information here, yada yada yada Even some more information here, yada yada yada
>HEADER_Text3
Information here, yada yada yada Some more information here, yada yada yada Even some more information here, yada yada yada有人知道我是如何想出一个正则表达式模式来做这件事的吗?
附注:这种格式在计算科学中是一种常见的FASTA格式。
谢谢!
发布于 2013-02-11 04:04:25
正如注释中所指出的,最好的选择是使用现有的FASTA解析器。为什么不行?
下面是我如何根据前导的大于号来连接行:
def joinup(f):
buf = []
for line in f:
if line.startswith('>'):
if buf:
yield " ".join(buf)
yield line.rstrip()
buf = []
else:
buf.append(line.rstrip())
yield " ".join(buf)
for joined_line in joinup(open("...")):
# blah blah...发布于 2013-02-11 02:27:43
假设>总是应该是新行上的第一个字符
"\n(^>)“,带”\1“
发布于 2013-02-11 02:54:50
您不必使用regex:
[ x.startswith('>') and x or x.replace('\n','') for x in f.readlines()] 应该行得通。
In [43]: f=open('test.txt')
In [44]: contents=[ x.startswith('>') and x or x.replace('\n','') for x in f.readlines()]
In [45]: contents
Out[45]:
['>HEADER_Text1\n',
'Information here, yada yada yada',
'Some more information here, yada yada yada',
'Even some more information here, yada yada yada',
'>HEADER_Text2\n',
'Information here, yada yada yada',
'Some more information here, yada yada yada',
'Even some more information here, yada yada yada',
'>HEADER_Text3\n',
'Information here, yada yada yada',
'Some more information here, yada yada yada',
'Even some more information here, yada yada yada']https://stackoverflow.com/questions/14800970
复制相似问题