文章/答案/技术大牛

发布

社区首页 >问答首页 >用ELMo嵌入段落

问用ELMo嵌入段落
EN

Stack Overflow用户

提问于 2018-12-01 12:47:32

回答 1查看 1.3K关注 0票数 3

我正在努力理解如何为ELMo矢量化编写段落。

文档只显示如何在同一时间嵌入多个句子/单词。

例如：

sentences = [["the", "cat", "is", "on", "the", "mat"],
         ["dogs", "are", "in", "the", "fog", ""]]
elmo(
     inputs={
          "tokens": sentences,
          "sequence_len": [6, 5]
            },
     signature="tokens",
     as_dict=True
    )["elmo"]

据我所知，这将返回两个向量，每个向量代表一个给定的句子。我将如何准备输入数据，以向量化包含多个句子的整个段落。请注意，我希望使用自己的预处理。

可以这样做吗？

sentences = [["<s>" "the", "cat", "is", "on", "the", "mat", ".", "</s>", 
              "<s>", "dogs", "are", "in", "the", "fog", ".", "</s>"]]

或者像这样？

sentences = [["the", "cat", "is", "on", "the", "mat", ".", 
              "dogs", "are", "in", "the", "fog", "."]]

python

tensorflow

nlp

tensorflow-hub

elmo

回答 1

Stack Overflow用户

发布于 2018-12-01 19:42:31

ELMo生成上下文词向量。因此，与单词相对应的词向量是单词和上下文的函数，例如，它出现在句子中。

就像文档中的例子一样，您希望您的段落是一个句子列表，这些句子是标记的列表。你的第二个例子。要获得这种格式，可以使用spacy 令牌器

import spacy

# you need to install the language model first. See spacy docs.
nlp = spacy.load('en_core_web_sm')

text = "The cat is on the mat. Dogs are in the fog."
toks = nlp(text)
sentences = [[w.text for w in s] for s in toks.sents]

我不认为在第二句话中需要额外的填充""，因为sequence_len会处理这个问题。

更新

据我所知，这将返回两个向量，每个向量代表一个给定的句子。

不，这将返回每个单词的向量，在每个句子中。如果您希望整个段落成为上下文(每个单词)，只需将其更改为

sentences = [["the", "cat", "is", "on", "the", "mat", "dogs", "are", "in", "the", "fog"]]

和

...
"sequence_len": [11]

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/53570918

复制

相似问题

问用ELMo嵌入段落
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问用ELMo嵌入段落EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问用ELMo嵌入段落
EN