首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >语义搜索精调

语义搜索精调
EN

Stack Overflow用户
提问于 2021-08-31 14:51:17
回答 1查看 301关注 0票数 2

例如:句子余弦相似度的预训练BERT结果

代码语言:javascript
复制
======================

Query: milk with chocolate flavor

Top 10 most similar sentences in corpus:
Milka milk chocolate 100 g (Score: 0.8672)
Alpro, Chocolate soy drink 1 ltr (Score: 0.6821)
Danone, HiPRO 25g Protein chocolate flavor 330 ml (Score: 0.6692)

在上面的例子中,我在寻找牛奶--结果首先应该与牛奶相关,但在这里,它首先返回巧克力。我如何微调相似的结果?

我谷歌了它,但没有找到任何适当的解决方案,请帮助我。

代码:

代码语言:javascript
复制
import scipy
import numpy as np
from sentence_transformers import models, SentenceTransformer
model = SentenceTransformer('distilbert-base-multilingual-cased')

corpus = [
          "Alpro, Chocolate soy drink 1 ltr",
          "Milka milk chocolate 100 g",
          "Danone, HiPRO 25g Protein chocolate flavor 330 ml"
         ]
corpus_embeddings = model.encode(corpus)

queries = [
            'milk with chocolate flavor',
          ]
query_embeddings = model.encode(queries)

# Calculate Cosine similarity of query against each sentence i
closest_n = 10
for query, query_embedding in zip(queries, query_embeddings):
    distances = scipy.spatial.distance.cdist([query_embedding], corpus_embeddings, "cosine")[0]

    results = zip(range(len(distances)), distances)
    results = sorted(results, key=lambda x: x[1])

    print("\n======================\n")
    print("Query:", query)
    print("\nTop 10 most similar sentences in corpus:")

    for idx, distance in results[0:closest_n]:
        print(corpus[idx].strip(), "(Score: %.4f)" % (1-distance))
EN

回答 1

Stack Overflow用户

发布于 2021-09-13 19:53:01

尝试距离上的阈值

代码语言:javascript
复制
import scipy
import numpy as np
from sentence_transformers import models, SentenceTransformer
model = SentenceTransformer('distilbert-base-multilingual-cased')

corpus = [
          "Alpro, Chocolate soy drink 1 ltr",
          "Milka milk chocolate 100 g",
          "Danone, HiPRO 25g Protein chocolate flavor 330 ml"
         ]
corpus_embeddings = model.encode(corpus)

queries = [
            'milk with chocolate flavor',
          ]
query_embeddings = model.encode(queries)

# Calculate Cosine similarity of query against each sentence i
closest_n = 10
for query, query_embedding in zip(queries, query_embeddings):
    distances = scipy.spatial.distance.cdist([query_embedding], corpus_embeddings, "cosine")[0]

    results = zip(range(len(distances)), distances)
    results = sorted(results, key=lambda x: x[1])

    print("\n======================\n")
    print("Query:", query)
    print("\nTop 10 most similar sentences in corpus:")

    for idx, distance in results[0:closest_n]:
        if 1-distance>0.7:
            print(corpus[idx].strip(), "(Score: %.4f)" % (1-distance))
票数 -1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/69000902

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档