我正在尝试编译一个正则表达式,以便能够从推文中累积一系列标签(r'#\w+')。我希望能够编译两个正则表达式,可以做到这一点,从开始和结束的推文。我正在使用python 272,我的代码是这样的。
HASHTAG_SEQ_REGEX_PATTERN = r"""
( #Outermost grouping to match overall regex
#\w+ #The hashtag matching. It's a valid combination of \w+
([:\s,]*#\w+)* #This is an optional (0 or more) sequence of hashtags separated by [\s,:]*
) #Closing parenthesis of outermost grouping to match overall regex
"""
LEFT_HASHTAG_REGEX_SEQ = re.compile('^' + HASHTAG_SEQ_REGEX_PATTERN , re.VERBOSE | re.IGNORECASE)当我正在编译正则表达式的那行被执行时,我得到了以下错误:
sre_constants.error: unbalanced parenthesis我不知道为什么会这样,因为在我的正则表达式模式中看不到不平衡的括号。
发布于 2013-03-08 06:17:26
此行在第一个#之后被注释掉
v----comment starts here
([:\s,]*#\w+)* ...避开它:
([:\s,]*\#\w+)* 这一行也是如此,但它不会导致不对称的括号:)
v----escape me
#\w+ #The hashtag matching ... HASHTAG_SEQ_REGEX_PATTERN = r"""
( # Outermost grouping to match overall regex
\#\w+ # The hashtag matching. It's a valid combination of \w+
([:\s,]*\#\w+)* # This is an optional (0 or more) sequence of hashtags separated by [\s,:]*
) # Closing parenthesis of outermost grouping to match overall regex
"""发布于 2013-03-08 06:19:27
你有一些未转义的散列,你想合法地使用它们,但VERBOSE把你搞砸了:
\#\w+
([:\s,]*\#\w+)* #reported issue caused by this hash发布于 2013-03-08 06:29:55
如果你把模式写成这样,你就不会有这个问题了:
HASHTAG_SEQ_REGEX_PATTERN = (
'(' #Outermost grouping to match overall regex
'#\w+' #The hashtag matching. It's a valid combination of \w+
'([:\s,]*#\w+)*' #This is an optional (0 or more) sequence of hashtags separated by [\s,:]*
')' #Closing parenthesis of outermost grouping to match overall regex
)就我个人而言,我从不使用re.VERBOSE,我从不提醒有关空格和其他内容的规则
https://stackoverflow.com/questions/15282815
复制相似问题