很容易分离出来,
re.findall(r"\((\w+)\)", "It's Jane's cat Jack (male)") #1
re.findall("(?<=\()\w+(?=\))", "It's Jane's cat Jack (male)") #2
# ['male']
re.findall(r"\w+(?='s)", "It's Jane's cat Jack (male)")
# ['It', 'Jane']
re.findall(r"\S+", "It's Jane's cat Jack (male)")
# ["It's", "Jane's", 'cat', 'Jack (male)']但是这让我困惑了为什么
re.findall(r"\((\w+)\)|\w+(?='s)|\S+", "It's Jane's cat Jack (male)") #1
re.findall(r"(?<=\()\w+(?=\))|\w+(?='s)|\S+", "It's Jane's cat Jack (male)") #2
# ['It', "'s", 'Jane', "'s", 'cat', 'Jack', '(male)']从不生产:
# ['It', 'Jane', 'cat', 'Jack', 'male']顺便说一下,第一还是第二,哪个更好?他们产生了同样的结果。
谢谢您的意见和回复
发布于 2015-11-15 03:08:42
您可以尝试这样做,因为\S+将匹配一个或多个非空白字符,这也将匹配剩余的's。此外,在比较您给出的两个方法时,您必须使用第二个方法,因为第一个方法应该返回male字符串和许多空字符串,因为正则表达式中存在捕获组。
>>> re.findall(r"(?<=\()\w+(?=\))|\w+(?='s)|(?<!\S)\w+(?!\S)", "It's Jane's cat Jack (male)")
['It', 'Jane', 'cat', 'Jack', 'male']或
>>> [i for i in re.split(r"\s*(?:[()]|'s|\s)\s*", "It's Jane's cat Jack (male)") if i]
['It', 'Jane', 'cat', 'Jack', 'male']https://stackoverflow.com/questions/33715904
复制相似问题