我一定是错过了一些很明显的东西。
我有一个元组的列表,它们是(短语,数字)对。我想从我的停止词列表中删除包含停止词的短语的整个元组。
stopwords = ['for', 'with', 'and', 'in', 'on', 'down']
tup_list = [('faucet', 5185), ('kitchen', 2719), ('faucets', 2628),
('kitchen faucet', 1511), ('shower', 1471), ('bathroom', 1131),
('handle', 1048), ('for', 1035), ('cheap', 960), ('bronze', 807),
('tub', 797), ('sale', 771), ('sink', 762), ('with', 696),
('single', 620), ('kitchen faucets', 615), ('stainless faucet', 613),
('pull', 603), ('and', 477), ('in', 447), ('single handle', 430),
('for sale', 406), ('bathroom faucet', 392), ('on', 369),
('down', 363), ('head', 359), ('pull down', 357), ('wall', 351),
('faucet with', 350)]
for p,n in tup_list:
print('p', p, p.split(), any(phrase in stopwords for phrase in p.split()))
print(len(tup_list))
for p,n in tup_list:
if any(phrase in stopwords for phrase in p.split()):
tup_list.remove((p,n))
print('Removing', p)
print(len(tup_list))
print([item for item in tup_list if item[0] == 'in'])当我运行上述操作时,会得到以下打印输出:
p faucet ['faucet'] False
p kitchen ['kitchen'] False
p faucets ['faucets'] False
p kitchen faucet ['kitchen', 'faucet'] False
p shower ['shower'] False
p bathroom ['bathroom'] False
p handle ['handle'] False
p for ['for'] True
p cheap ['cheap'] False
p bronze ['bronze'] False
p tub ['tub'] False
p sale ['sale'] False
p sink ['sink'] False
p with ['with'] True
p single ['single'] False
p kitchen faucets ['kitchen', 'faucets'] False
p stainless faucet ['stainless', 'faucet'] False
p pull ['pull'] False
p and ['and'] True
p in ['in'] True
p single handle ['single', 'handle'] False
p for sale ['for', 'sale'] True
p bathroom faucet ['bathroom', 'faucet'] False
p on ['on'] True
p down ['down'] True
p head ['head'] False
p pull down ['pull', 'down'] True
p wall ['wall'] False
p faucet with ['faucet', 'with'] True
29
Removing for
Removing with
Removing and
Removing for sale
Removing on
Removing pull down
Removing faucet with
22
[('in', 447)]我的问题:为什么包含('in', 447)的元组不被删除?打印输出显示p in ['in'] True的意思是' in‘在停止词列表中,那么为什么tup_list.remove((p,n))不删除它?
发布于 2018-02-21 00:29:44
从列表中删除项时,索引将发生更改。当您遍历一个更改的列表时,您将看到意外的结果。
这里有一个解决办法。这不是最有效的,但可能适合你的需要。
remove_indices = []
for i, (p, n) in enumerate(tup_list):
if any(phrase in stopwords for phrase in p.split()):
remove_indices.append(i)
print('Removing', p)
tup_list = [i for j, i in enumerate(tup_list) if j not in remove_indices]https://stackoverflow.com/questions/48895855
复制相似问题