首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何使用句子转换器预训练模型使用paraphrase_mining

如何使用句子转换器预训练模型使用paraphrase_mining
EN

Stack Overflow用户
提问于 2020-11-11 09:34:02
回答 1查看 610关注 0票数 0

我正在尝试使用一个预先训练好的句子转换器模型来寻找句子之间的相似性。我正在尝试遵循这里的代码- https://www.sbert.net/docs/usage/paraphrase

在试验一中,我运行了2个for循环,在这个循环中,我试图找到给定句子与其他句子的相似度。这是它的代码-

代码语言:javascript
复制
from sentence_transformers import SentenceTransformer, util
model = SentenceTransformer('distilbert-base-nli-stsb-mean-tokens')


# Single list of sentences
sentences = ['The cat sits outside',
             'A man is playing guitar',
             'The new movie is awesome',
             'Do you like pizza?']

#Compute embeddings
embeddings = model.encode(sentences, convert_to_tensor=True)

#Compute cosine-similarities for each sentence with each other sentence
cosine_scores = util.pytorch_cos_sim(embeddings, embeddings)

#Find the pairs with the highest cosine similarity scores
pairs = []
for i in range(len(cosine_scores)-1):
    for j in range(i+1, len(cosine_scores)):
        pairs.append({'index': [i, j], 'score': cosine_scores[i][j]})

#Sort scores in decreasing order
pairs = sorted(pairs, key=lambda x: x['score'], reverse=True)

print(len(pairs))
6

for pair in pairs[0:10]:
    i, j = pair['index']
    print("{} \t\t {} \t\t Score: {:.4f}".format(sentences[i], sentences[j], pair['score']))

A man is playing guitar          Do you like pizza?          Score: 0.1080
The new movie is awesome         Do you like pizza?          Score: 0.0829
A man is playing guitar          The new movie is awesome        Score: 0.0652
The cat sits outside         Do you like pizza?          Score: 0.0523
The cat sits outside         The new movie is awesome        Score: -0.0270
The cat sits outside         A man is playing guitar         Score: -0.0530

这是预期的,因为在4个句子的组合之间可以有6个相似性分数的组合。在他们的文档页面上,他们提到由于二次复杂性,这不能很好地扩展,因此他们建议使用paraphrase_mining()方法。

但是当我尝试使用这种方法时,我没有得到6个组合,而是只得到了5个。为什么会这样呢?

下面是我尝试使用paraphrase_mining()方法的示例代码-

代码语言:javascript
复制
# Single list of sentences
sentences = ['The cat sits outside',
             'A man is playing guitar',
             'The new movie is awesome',
             'Do you like pizza?']


paraphrases = util.paraphrase_mining(model, sentences)
print(len(paraphrases))
5

k = 0
for paraphrase in paraphrases:
    print(k)
    score, i, j = paraphrase
    print("{} \t\t {} \t\t Score: {:.4f}".format(sentences[i], sentences[j], score))
    print()
    k = k + 1

0
A man is playing guitar          Do you like pizza?          Score: 0.1080

1
The new movie is awesome         Do you like pizza?          Score: 0.0829

2
A man is playing guitar          The new movie is awesome        Score: 0.0652

3
The cat sits outside         Do you like pizza?          Score: 0.0523

4
The cat sits outside         The new movie is awesome        Score: -0.0270

paraphrase_mining()的工作方式有区别吗?

EN

回答 1

Stack Overflow用户

发布于 2020-11-18 16:08:43

谢谢你指出这一点。

当句子列表非常小时,paraphrase_mining函数中有一个小错误。它没有计算所有的组合,而是只计算每个句子的n-1个组合。对于较大的句子列表,这是没有问题的,但对于您的特定示例,它忽略了最不相关的组合,并返回了比预期更多的观众对。

它已在存储库中修复,并将成为下一个版本的一部分。

PS:你也可以在Github上发布你的问题:https://github.com/UKPLab/sentence-transformers/issues

在那里,我收到了电子邮件通知,可以更快地回复。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/64779234

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档