我正在尝试使用一个预先训练好的句子转换器模型来寻找句子之间的相似性。我正在尝试遵循这里的代码- https://www.sbert.net/docs/usage/paraphrase
在试验一中,我运行了2个for循环,在这个循环中,我试图找到给定句子与其他句子的相似度。这是它的代码-
from sentence_transformers import SentenceTransformer, util
model = SentenceTransformer('distilbert-base-nli-stsb-mean-tokens')
# Single list of sentences
sentences = ['The cat sits outside',
'A man is playing guitar',
'The new movie is awesome',
'Do you like pizza?']
#Compute embeddings
embeddings = model.encode(sentences, convert_to_tensor=True)
#Compute cosine-similarities for each sentence with each other sentence
cosine_scores = util.pytorch_cos_sim(embeddings, embeddings)
#Find the pairs with the highest cosine similarity scores
pairs = []
for i in range(len(cosine_scores)-1):
for j in range(i+1, len(cosine_scores)):
pairs.append({'index': [i, j], 'score': cosine_scores[i][j]})
#Sort scores in decreasing order
pairs = sorted(pairs, key=lambda x: x['score'], reverse=True)
print(len(pairs))
6
for pair in pairs[0:10]:
i, j = pair['index']
print("{} \t\t {} \t\t Score: {:.4f}".format(sentences[i], sentences[j], pair['score']))
A man is playing guitar Do you like pizza? Score: 0.1080
The new movie is awesome Do you like pizza? Score: 0.0829
A man is playing guitar The new movie is awesome Score: 0.0652
The cat sits outside Do you like pizza? Score: 0.0523
The cat sits outside The new movie is awesome Score: -0.0270
The cat sits outside A man is playing guitar Score: -0.0530这是预期的,因为在4个句子的组合之间可以有6个相似性分数的组合。在他们的文档页面上,他们提到由于二次复杂性,这不能很好地扩展,因此他们建议使用paraphrase_mining()方法。
但是当我尝试使用这种方法时,我没有得到6个组合,而是只得到了5个。为什么会这样呢?
下面是我尝试使用paraphrase_mining()方法的示例代码-
# Single list of sentences
sentences = ['The cat sits outside',
'A man is playing guitar',
'The new movie is awesome',
'Do you like pizza?']
paraphrases = util.paraphrase_mining(model, sentences)
print(len(paraphrases))
5
k = 0
for paraphrase in paraphrases:
print(k)
score, i, j = paraphrase
print("{} \t\t {} \t\t Score: {:.4f}".format(sentences[i], sentences[j], score))
print()
k = k + 1
0
A man is playing guitar Do you like pizza? Score: 0.1080
1
The new movie is awesome Do you like pizza? Score: 0.0829
2
A man is playing guitar The new movie is awesome Score: 0.0652
3
The cat sits outside Do you like pizza? Score: 0.0523
4
The cat sits outside The new movie is awesome Score: -0.0270paraphrase_mining()的工作方式有区别吗?
发布于 2020-11-18 16:08:43
谢谢你指出这一点。
当句子列表非常小时,paraphrase_mining函数中有一个小错误。它没有计算所有的组合,而是只计算每个句子的n-1个组合。对于较大的句子列表,这是没有问题的,但对于您的特定示例,它忽略了最不相关的组合,并返回了比预期更多的观众对。
它已在存储库中修复,并将成为下一个版本的一部分。
PS:你也可以在Github上发布你的问题:https://github.com/UKPLab/sentence-transformers/issues
在那里,我收到了电子邮件通知,可以更快地回复。
https://stackoverflow.com/questions/64779234
复制相似问题