我有一个工作程序来确定新闻项目所属的类别。当在Python中为标题、类别、子类别和搜索单词赋值为RegExp时,该例程起作用。
但是,当以字符串的形式从PostgreSQL检索这些值时,我不会收到任何错误,也不会从同一个例程中得到任何结果。
我检查了数据类型,它们都是Python字符串。
怎样才能解决这个问题呢?
# set the text to be analyzed
title = "next week there will be a presentation. The location will be aat"
# these could be the categories
category = "presentation"
subcategory = "scientific"
# these are the regular expressions
main_category_search_words = r'\bpresentation\b'
sub_category_search_words= r'\basm microbe\b | \basco\b | \baat\b'
category_final = ''
subcategory_final = ''
# identify main category
r = re.compile(main_category_search_words, flags=re.I | re.X)
result = r.findall(title)
if len(result) == 1:
category_final = category
# identify sub category
r2 = re.compile(sub_category_search_words, flags=re.I | re.X)
result2 = r2.findall(title)
if len(result2) > 0:
subcategory_final = subcategory
print("analysis result:", category_final, subcategory_final)发布于 2018-06-11 09:15:58
我很确定从PostgreSQL中得到的不是raw string literal,因此您的RegEx是无效的。您必须在DB中显式地转义模式中的反斜杠。
print(r"\basm\b")
print("\basm\b")
print("\\basm\\b")
# output
\basm\b
as # yes, including the line break above here
\basm\bhttps://stackoverflow.com/questions/50793945
复制相似问题