首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何修复Spacy版本3.1的Spacy变压器

如何修复Spacy版本3.1的Spacy变压器
EN

Stack Overflow用户
提问于 2022-04-23 08:11:41
回答 1查看 407关注 0票数 1

我有以下问题。我一直试图从这个源代码复制示例代码:Github

我在Linux和Spacy 3.1上使用木星实验室环境

代码语言:javascript
复制
# $ pip install spacy-transformers
# $ python -m spacy download en_trf_bertbaseuncased_lg

import spacy
nlp = spacy.load("en_trf_bertbaseuncased_lg")
apple1 = nlp("Apple shares rose on the news.")
apple2 = nlp("Apple sold fewer iPhones this quarter.")
apple3 = nlp("Apple pie is delicious.")

# sentence similarity
print(apple1.similarity(apple2)) #0.69861203
print(apple1.similarity(apple3)) #0.5404963

# sentence embeddings
apple1.vector  # or apple1.tensor.sum(axis=0)

我正在使用Spacy 3.1,所以我改变了

python -m spacy download en_trf_bertbaseuncased_lg

python -m spacy download en_core_web_trf

现在我装上

nlp = spacy.load("en_trf_bertbaseuncased_lg")

使用

nlp = spacy.load("en_core_web_trf")

现在完整的代码如下所示

代码语言:javascript
复制
import spacy
nlp = spacy.load("en_core_web_trf")
apple1 = nlp("Apple shares rose on the news.")
apple2 = nlp("Apple sold fewer iPhones this quarter.")
apple3 = nlp("Apple pie is delicious.")

# sentence similarity
print(apple1.similarity(apple2)) #0.69861203
print(apple1.similarity(apple3)) #0.5404963

# sentence embeddings
apple1.vector  # or apple1.tensor.sum(axis=0)

但是,在运行代码时,我的输出不是:

#0.69861203 #0.5404963

变得简单

#0.0 #0.0

我还得到了以下UserWarinig:

代码语言:javascript
复制
<ipython-input-30-ed0c29210d4e>:8: UserWarning: [W007] The model you're using has no word vectors loaded, so the result of the Doc.similarity method will be based on the tagger, parser and NER, which may not give useful similarity judgements. This may happen if you're using one of the small models, e.g. `en_core_web_sm`, which don't ship with word vectors and only use context-sensitive tensors. You can always add your own word vectors, or use one of the larger models instead if available.
  print(apple1.similarity(apple2)) #0.69861203
<ipython-input-30-ed0c29210d4e>:8: UserWarning: [W008] Evaluating Doc.similarity based on empty vectors.
  print(apple1.similarity(apple2)) #0.69861203
<ipython-input-30-ed0c29210d4e>:9: UserWarning: [W007] The model you're using has no word vectors loaded, so the result of the Doc.similarity method will be based on the tagger, parser and NER, which may not give useful similarity judgements. This may happen if you're using one of the small models, e.g. `en_core_web_sm`, which don't ship with word vectors and only use context-sensitive tensors. You can always add your own word vectors, or use one of the larger models instead if available.
  print(apple1.similarity(apple3)) #0.5404963
<ipython-input-30-ed0c29210d4e>:9: UserWarning: [W008] Evaluating Doc.similarity based on empty vectors.
  print(apple1.similarity(apple3)) #0.5404963

有人知道如何修正这段代码来正确计算相似度吗?

EN

回答 1

Stack Overflow用户

发布于 2022-05-02 03:21:19

Doc.similarity使用单词向量来计算相似度,而Transformers模型不包括它们。您应该使用en_core_web_lg或其他带有字向量的模型,或者使用另一种方法,例如自定义钩子或语句转换器。

有关更多细节,请参见相似文献最近的讨论

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/71977955

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档