因此,我有以下几个词的清单:
list = ['cyber attacks, 28','cyber attacks. A', 'cyber attacks, intrusions', 'cyber attacks; 3','cyber attack. Our', 'cyber intrusions, data']我想要做的是,如果第三个单词有超过三个字符,就删除列表中的短语。因此,最后的清单是:
new_list = ['cyber attacks, 28','cyber attacks. A', 'cyber attacks; 3','cyber attack. Our']这是我到目前为止所掌握的,但也包括了最后一个单词超过三个字符的短语:
new_list = []
for phrase in list:
max_three_char = re.match('cyber\s\w{1,}(\.|,|;|\)|\/|:|"|])\s\w{,3}', phrase)
if max_three_char:
new_list.append(phrase)发布于 2022-05-03 15:04:44
我会这样做:
import re
li = ['cyber attacks, 28','cyber attacks. A', 'cyber attacks, intrusions', 'cyber attacks; 3','cyber attack. Our', 'cyber intrusions, data']
>>> [s for s in li if re.search(r'(?<=\W)\w{1,3}$', s)]
['cyber attacks, 28', 'cyber attacks. A', 'cyber attacks; 3', 'cyber attack. Our']或者,如果您可以指望有一个空格分隔符:
>>> [s for s in li if len(s.split()[-1])<=3]
# same发布于 2022-05-03 14:59:48
您可以使用列表理解,如
import re
lst = ['cyber attacks, 28','cyber attacks. A', 'cyber attacks, intrusions', 'cyber attacks; 3','cyber attack. Our', 'cyber intrusions, data']
pattern = re.compile(r'[, ]+')
new_lst = [item
for item in lst
for splitted in [pattern.split(item)]
if not (len(splitted) > 2 and len(splitted[2]) > 3)]
print(new_lst)就会屈服
['cyber attacks, 28', 'cyber attacks. A', 'cyber attacks; 3', 'cyber attack. Our']不要以内置的东西(如list等)来命名变量。
发布于 2022-05-03 15:00:00
不需要regex,您可以使用string.split
if len(my_phrase.split()[2]) <=3:
//process my_phrase这是因为单词之间有空格。
https://stackoverflow.com/questions/72101240
复制相似问题