我试图在python中匹配字符串:
例如,如果我的短语是"long string"
我想与"long string", "Long StrInG", "long!!!string"相匹配,但不想与"Long strings"或"stringlong"相匹配。在任何文本字符串中,我想匹配所有字符串中的所有实例--顺序--不尊重大写,不捕获子字符串。
ie) when I do
string = "hello"
strings = "hellos"
string in strings == True but I don't want this to be true我还希望字符串捕捉由空格或标点符号分隔的长句中的任何实例:
ie) string = "long string" should match
"hello ~~~!!!!! long !@#!@#!@ string"
Whitespace also matters - I don't want to match
string = "longstringlongstring" or "longs trying"到目前为止,我尝试过的是:
text = text where we are seeing if it contains instance of string
phrase = string to look for in text
cleaned_text = ""
for char in text:
if char in string.punctuation:
char = " "
cleaned_text += char
else:
cleaned_text += char.lower()
cleaned_string = " ".join(cleaned_text.split())
counter = 0
for char in cleaned_string:
for char2 in phrase:
if char == char2:
counter += 1
if counter == len(phrase):
return True
return False我意识到我不能使用列表,因为订单并不重要。真的很感激你的建议!
发布于 2021-06-26 17:41:28
具有正则表达式:
import re
import string
# given phrase
phrase = "long string"
# this says what can go between two words of the phrase above
between = "[" + r"\s" + re.escape(string.punctuation) + "]+"
# the pattern
pat = r"\b" + between.join(phrase.split()) + r"\b"
reg = re.compile(pat, flags=re.I)其中between由空格(\s)和来自string.punctuation的所有标点符号组成,至少可以看到一次(因为它周围的[]+ )。我们re.escape它是因为它包含正则元字符,但是我们需要文字匹配(例如,$)。然后,将短语中的单词与patjoin结合起来,最后将单词边界(\b)放置在词组的两端,以保证匹配的精确性,例如防止long stringS匹配。re.I在编译regex时说忽略了这个情况。
对于这个短语,pat看起来就像
\blong[\s!"\#\$%\&\'\(\)\*\+,\-\./:;<=>\?@\[\\\]\^_`\{\|\}\~]+string\b如果你要用一个词phrase,例如,phrase = "this",那么
\bthis\b也就是说,没有标点符号和中间的空格,因为只有一个词。
最后,对于一个3字的phrase,例如,phrase = "no escape needed"
\bno[\s!"\#\$%\&\'\(\)\*\+,\-\./:;<=>\?@\[\\\]\^_`\{\|\}\~]+escape[\s!"\#\$%\&\'\(\)\*\+,\-\./:;<=>\?@\[\\\]\^_`\{\|\}\~]+needed\b也就是说,它动态地形成正则表达式。
示例运行以进行测试(如果它是is not None,那么就有一个匹配):
>>> re.search(reg, "long string") is not None
True
>>> re.search(reg, "Long StrInG") is not None
True
>>> re.search(reg, "long!!!string") is not None
True
>>> re.search(reg, "Long strings") is not None
False
>>> re.search(reg, "stringlong") is not None
False
>>> re.search(reg, "hello ~~~!!!!! long !@#!@#!@ string") is not None
True
>>> re.search(reg, "longstring") is not None
False您可以参考regex 这里。
https://stackoverflow.com/questions/68144464
复制相似问题