我试图以这样一种方式来匹配下面的数据,这样我就可以在timecodes之间提取文本。
subs='''
1
00:00:00,130 --> 00:00:01,640
where you there when it
happened?
Who else saw you?
2
00:00:01,640 --> 00:00:03,414
This might be your last chance to
come clean. Take it or leave it.
'''
Regex=re.compile(r'(\d\d:\d\d\:\d\d,\d\d\d) --> (\d\d:\d\d\:\d\d,\d\d\d)(\n.+)((\n)?).+')我的regex匹配第一行时间码和第一行文本,但只从第二行返回几个字符,而不是整个第二行。我怎样才能让它匹配出的时间代码和实时代码之间的所有东西?
发布于 2019-11-06 04:55:09
当前方法的一个可能问题是,在试图捕获时间戳之间的所有内容时,您没有使用DOT模式。我有re.search在DOT的所有模式工作:
subs="""
1 00:00:00,130 --> 00:00:01,640
where you there when it happened?
Who else saw you?
2 00:00:01,640 --> 00:00:03,414
This might be your last chance to
come clean. Take it or leave it. """
match = re.search(r'\d+ \d{2}:\d{2}:\d{2},\d+ --> \d{2}:\d{2}:\d{2},\d+\s*(.*)\d+ \d{2}:\d{2}:\d{2},\d+ --> \d{2}:\d{2}:\d{2},\d+', subs, flags=re.DOTALL)
if match:
print match.group(1)
else:
print 'NO MATCH'这些指纹:
where you there when it happened?
Who else saw you?发布于 2019-11-06 06:35:51
我不确定,但我认为下面的解决方案更适用于你的情况.
使用下面的解决方案,您不仅可以在时间代码之间提取文本,而且还可以将文本连接到时间代码。
import re
multiline_text=\
"""
1 00:00:00,130 --> 00:00:01,640
where you there when it happened?
Who else saw you?
2 00:00:01,640 --> 00:00:03,414
This might be your last chance to
come clean. Take it or leave it.
"""
lines = multiline_text.split('\n')
dict = {}
current_key = None;
for line in lines:
is_key_match_obj = re.search('([\d\:\,]{12})(\s-->\s)([\d\:\,]{12})', line)
if is_key_match_obj:
current_key = is_key_match_obj.group()
continue
if current_key:
if current_key in dict:
if not line:
dict[current_key] += '\n'
else:
dict[current_key] += line
else:
dict[current_key] = line
print(dict)发布于 2019-11-06 15:46:14
您也可以不使用DOTALL获得匹配。
匹配第1组中的timecode和capture,匹配所有不以timecode开头的行,这些行使用的是负前瞻性。
^\d{2}:\d{2}:\d{2},\d+ --> \d{2}:\d{2}:\d{2},\d+((?:\r?\n(?!\d{2}:\d{2}:\d{2},\d+ --> \d{2}:\d{2}:\d{2},\d).*)*)各部分
^开始\d{2}:\d{2}:\d{2},\d+ --> \d{2}:\d{2}:\d{2},\d+匹配时间码模式( Capture group 1 (?:非捕获群\r?\n匹配新的留置权(?!\d{2}:\d{2}:\d{2},\d+ --> \d{2}:\d{2}:\d{2},\d)负向前看,不要断言时间码.*匹配除换行符0+时间以外的任何字符
- `)*` Close noncapturing group and repeat 0+ times
)封闭捕获组1https://stackoverflow.com/questions/58723190
复制相似问题