我有两个列表,我想确定哪些元素在列表中是常见的(在含义或上下文上相同或相似)。我们应该使用哪种NLP算法。
list-1= [US, Apple, Trump, Biden, Mango, French, German]
list-2= [State, iphone, ipad, ipod, president, person, Fruit, Language, Country]发布于 2022-02-09 12:58:51
最简单的实现是使用以下步骤:
Step 1 : Iterate through both the list
Step 2 : Calculate the Cossine Similarity between each word in list1 with list2
Step 3 : Decide the threshold on cossine similarity. Higher means stricter守则如下:
list_1 = [ US, Apple, Trump, Biden, Mango, French, German]
list_2 = [State, iphone, ipad, ipod, president, person, Fruit, Language, Country]
# Download the package and model :
from gensim.models import Word2Vec
similarity_dict = {}
for word_list1 in list_1:
for word_list2 in list_2:
model = Word2Vec.load(path/to/your/model)
cosine_similarity = model.wv.similarity(word_list1, word_list2)优点:
缺点:
https://datascience.stackexchange.com/questions/108016
复制相似问题