我的代码:
lis2 = []
lis1 = []
for cm in comments:
sp = cm.split()
for s in sp:
for tf in tfidf:
if tf == s:
lis2.append(tf)
else:
continue
lis1.append(lis2)
print(lis1)
data = pd.DataFrame(lis1)在这段代码中,有两个列表:
comments:句子列表tfidf:一个单词列表。我想迭代每个句子(comments),并从tfidf列表中找到任何单词,并将其附加到新列表 lis2中。
此外,当第一句完成后,将lis2追加到lis1,然后转到下一句。
但是我的代码只返回这样的单词:
[['custom', 'servic', 'portfolio', 'time', 'custom', 'servic', 'custom', 'servic', 'support', 'ticket', 'custom', 'servic', 'experi', 'platform', 'user', 'experi', 'account', 'portfolio', 'experi', 'user', 'experi', 'user', 'platform', 'account', 'time', 'fast', 'platform', 'custom', 'custom', 'account', 'time', 'fast', 'time', 'time', 'account', 'custom', 'servic', 'servic', 'account', 'user', 'custom', 'custom', 'account', 'time', 'account', 'user', 'time', 'account']发布于 2022-08-14 10:57:12
comments = ['a1 a2 a3', 'b1 b2 b3', 'c1 c2 c3']
tfidf = ['a2', 'b1', 'b3']
lis_1 = []
for sentence in comments:
lis_2 = []
words = sentence.split()
for word in words:
if word in tfidf:
lis_2.append(word)
# After all words of a sentence are processed:
lis_1.append(lis_2)
print(lis_1)输出:
[['a2'], ['b1', 'b3'], []]lis_1 = [list(set(sentence.split()).intersection(tfidf)) for sentence in comments]发布于 2022-08-14 12:38:08
利用pandas.Series.str.findall,就可以避免循环。
comments = ["customer service emergency for service department", "client portfolio information about client spends lots of time on searching the website", "supporting customer to buy campaign ticket", "free bitcoin faucet", "experienced trading manager recruiter john cena", "user platform account deletion","Fast response platform with rock", "Login time consuming", "apple juice discount"]
comments
###
['customer service emergency for service department',
'client portfolio information about client spends lots of time on searching the website',
'supporting customer to buy campaign ticket',
'free bitcoin faucet',
'experienced trading manager recruiter john cena',
'user platform account deletion',
'Fast response platform with rock',
'Login time consuming',
'apple juice discount']tfidf = ["customer", "account", "service", "user", "time"]
re_pat = "|".join(tfidf)
df = pd.DataFrame({'comments':comments})output = df['comments'].str.findall(re_pat).tolist()
output
###
[['customer', 'service', 'service'],
['time'],
['customer'],
[],
[],
['user', 'account'],
[],
['time'],
[]]https://stackoverflow.com/questions/73350972
复制相似问题