我尝试在Python语言中模拟SAS函数INDEXW()的类似任务。INDEXW
# Sample
Col1 Col2
FIG AVE FIG AVE
LAKE HATCHINEHA RD HATCHINEHA RD
MERLE CIR MERLE CIR
ARCH ST W ARCH ST
WESTVIEW DR CLAYMORE CTdef INDEXW(source, excerpt):
delimiters = " "
regexPattern = '|'.join(map(re.escape, delimiters))
return any([str(excerpt).strip() == word for word in [x.strip() for x in re.split(regexPattern, str(source))]])
Sample["RESULT1"] = Sample[["Col1","Col2"]].apply(lambda x: INDEXW(*x), axis=1)
Sample["RESULT2"] = Sample[["Col2","Col1"]].apply(lambda x: INDEXW(*x), axis=1)上面返回的所有False都是不正确的。
# Answer
Col1 Col2 RESULT1 RESULT2
FIG AVE FIG AVE True True
LAKE HATCHINEHA RD HATCHINEHA RD False True
MERLE CIR MERLE CIR True True
ARCH ST W ARCH ST True False
WESTVIEW DR CLAYMORE CT False False我知道我们可以使用find()来获得正确的结果。(find()运行良好)
只是好奇我应该如何修改INDEXW()才能得到结果?谢谢
发布于 2021-09-30 01:02:00
解决了这个问题:
def INDEXW(source, excerpt):
from itertools import permutations, combinations
delimiters = " "
regexPattern = '|'.join(map(re.escape, delimiters))
features = [x.strip() for x in re.split(regexPattern, str(source))]
tmp = []
for i in range(len(features)):
oc = combinations(features, i + 1)
for c in oc:
tmp.append(list(c))
return any([str(excerpt).strip() == word for word in [' '.join(x) for x in tmp]])# TEST
INDEXW("LAKE HATCHINEHA RD", "HATCHINEHA RD")
> True
INDEXW("ARCH ST", "W ARCH ST")
> False
INDEXW("W ARCH ST", "ARCH ST")
> Truehttps://stackoverflow.com/questions/69383253
复制相似问题