文章/答案/技术大牛

发布

社区首页 >问答首页 >Doc2vec : TaggedLineDocument()

问Doc2vec : TaggedLineDocument()
EN

Stack Overflow用户

提问于 2016-04-22 04:47:16

回答 2查看 3.6K关注 0票数 2

所以，我正在努力学习和理解Doc2Vec。我正在关注这个tutorial。我的输入是一个文档列表，即单词列表。下面是我的代码：

    input = [["word1","word2",..."wordn"],["word1","word2",..."wordn"],...] 

    documents = TaggedLineDocument(input)

    model = doc2vec.Doc2Vec(documents,size = 50, window = 10, min_count = 2, workers=2)

但是我得到了一些unicode错误(尝试用good搜索这个错误，但是没有用)：

   TypeError('don\'t know how to handle uri %s' % repr(uri))

有没有人能帮我弄清楚我哪里错了？谢谢！

python

nlp

gensim

回答 2

Stack Overflow用户

发布于 2016-04-22 05:02:05

应使用文件路径实例化TaggedLineDocument。确保文件设置为一个文档等于一行的格式。

documents = TaggedLineDocument('myfile.txt')
documents = TaggedLineDocument('compressed_text.txt.gz')

从source code

uri (您用来实例化TaggedLineDocument的think )可以是：

1. a URI for the local filesystem (compressed ``.gz`` or ``.bz2`` files handled automatically):
   `./lines.txt`, `/home/joe/lines.txt.gz`, `file:///home/joe/lines.txt.bz2`
2. a URI for HDFS: `hdfs:///some/path/lines.txt`
3. a URI for Amazon's S3 (can also supply credentials inside the URI):
   `s3://my_bucket/lines.txt`, `s3://my_aws_key_id:key_secret@my_bucket/lines.txt`
4. an instance of the boto.s3.key.Key class.

票数 2

Stack Overflow用户

发布于 2017-09-22 16:06:09

对于数据，我的格式化列表与您的相同：

['aw'，'wb'，'ce'，'uw'，'qqg'，'g'，'e'，'ent'，'va'，'a'...]

对于标签，我有一个列表: 1，0，0 ...它表示我上面句子的类，你可以在这里有任何类(标签)(不只是1或0)

因为我们已经有了类似上面的列表，所以我们可以直接使用TaggedDocumnet，而不是TaggedLineDocument

    model = gensim.models.Doc2Vec(self.myDataFlow(data,labels))

    def myDataFlow(self,data,labels):
    for i, j in zip(data,labels):
        yield TaggedDocument(i,[j])

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/36780138

复制

相似问题

问Doc2vec : TaggedLineDocument()
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Doc2vec : TaggedLineDocument()EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Doc2vec : TaggedLineDocument()
EN