文章/答案/技术大牛

发布

社区首页 >问答首页 >缓存fnmatch.filter调用

问缓存fnmatch.filter调用
EN

Stack Overflow用户

提问于 2017-03-09 21:55:53

回答 1查看 328关注 0票数 1

我已经搜索过，但没有找到一个与我所要寻找的主题/答案特别匹配的主题/答案。

我希望实现一个(坏/亵渎)字筛选器，以便通配符匹配字符串中的单词列表中的任何单词，如果找到，则返回匹配。

它并不像“是字符串中的单词”那样简单，因为有些词可能会自己调皮，但在字符串的开头、中间和/或结尾时是可以接受的。例如“斯肯索普”！

我担心的是(从我所知道的情况来看)，在相对长的字符串(最多2048个字符)上有大量的重复/迭代，而且模式列表每次都被调用--有任何一种方法可以缓存它吗？

在聊天应用程序中，这个函数可以经常被调用，并且使用糟糕的300+单词列表，所以效率是关键。

这是我目前所拥有的，有不同匹配的例子，它工作得很完美--但作为一个Python新手，我不知道这是否有效，所以我希望专家能提供一些见解。

def badWordMatch(string):
    bad_words = ["poo", "wee", "barsteward*", "?orrible"]
    data = string.split()
    for each in bad_words:
        l = fnmatch.filter(data, each)
        if l:
            return each.replace("?","").replace("*","")
    return None

string_input = "Please do not wee in the swimming pool you 'orrible naughty barstewards!" # Matched: "wee"
#string_input = "Please do not dive in the swimming pool you 'orrible naughty barstewards!" # Matched: "barsteward"
#string_input = "Please do not dive in the swimming pool you 'orrible naughty kids!" # Matched: "orrible"
#string_input = "Please do not dive in the swimming pool you horrible naughty kids!" # Matched: "orrible"
#string_input = "Please do not dive in the swimming pool you naughty kids!" # No match!

isbadword = badWordMatch(string_input)

if isbadword is not None:
    print("Matched: %s" % (isbadword))
else:
    print("No match, string is clean!")

更新:正则表达式版本：

import re

bad_words = ["poo$", "wee$", "barsteward.*", ".orrible"]

string_input = "Please do not poo & wee in the swimming pool you horrible naughty barstewards! Shouldn't match: week, xbarsteward xhorrible"

strings = string_input.split()

def test3():
    r = re.compile('|'.join('(?:%s)' % p for p in bad_words))
    for s in strings:
        t = r.match(s)
        if t:
            print "Matched! " + t.group()

test3()

结果：

匹配！便便匹配！我们匹配了！可怕的匹配！酒保！

python

regex

string

回答 1

Stack Overflow用户

发布于 2017-03-09 23:18:06

在Python 3.2+中，缓存了fnmatch.filter 有一个LRU缓存装饰器。 (这意味着最近的256个调用)。除此之外，fnmatch执行的缓存不多。但是，内部使您的模式在内部被转换为regex 并因此自动缓存。。

从您的坏词列表中构建regex仍然更好，因为从这个答案 one (显式编译) regex比示例中的数百个(隐式编译)正则表达式要快得多。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/42706205

复制

相似问题

问缓存fnmatch.filter调用
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问缓存fnmatch.filter调用EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问缓存fnmatch.filter调用
EN