首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >使用polyglot包进行希伯来语命名实体识别

使用polyglot包进行希伯来语命名实体识别
EN

Stack Overflow用户
提问于 2016-07-10 21:25:38
回答 1查看 2.4K关注 0票数 3

我正在尝试使用多重标记包命名实体识别在希伯来语。

这是我的密码:

代码语言:javascript
复制
# -*- coding: utf8 -*-
import polyglot
from polyglot.text import Text, Word
from polyglot.downloader import downloader
downloader.download("embeddings2.iw")
text = Text(u"in france and in germany")
print(type(text))
text2 = Text(u"נסעתי מירושלים לתל אביב")
print(type(text2))
print(text.entities)
print(text2.entities)

这是输出:

代码语言:javascript
复制
<class 'polyglot.text.Text'>
<class 'polyglot.text.Text'>
[I-LOC([u'france']), I-LOC([u'germany'])]
Traceback (most recent call last):
  File "C:/Python27/Lib/site-packages/IPython/core/pyglot.py", line 15, in <module>
    print(text2.entities)
  File "C:\Python27\lib\site-packages\polyglot\decorators.py", line 20, in __get__
    value = obj.__dict__[self.func.__name__] = self.func(obj)
  File "C:\Python27\lib\site-packages\polyglot\text.py", line 132, in entities
    for i, (w, tag) in enumerate(self.ne_chunker.annotate(self.words)):
  File "C:\Python27\lib\site-packages\polyglot\decorators.py", line 20, in __get__
    value = obj.__dict__[self.func.__name__] = self.func(obj)
  File "C:\Python27\lib\site-packages\polyglot\text.py", line 100, in ne_chunker
    return get_ner_tagger(lang=self.language.code)
  File "C:\Python27\lib\site-packages\polyglot\decorators.py", line 30, in memoizer
    cache[key] = obj(*args, **kwargs)
  File "C:\Python27\lib\site-packages\polyglot\tag\base.py", line 191, in get_ner_tagger
    return NEChunker(lang=lang)
  File "C:\Python27\lib\site-packages\polyglot\tag\base.py", line 104, in __init__
    super(NEChunker, self).__init__(lang=lang)
  File "C:\Python27\lib\site-packages\polyglot\tag\base.py", line 40, in __init__
    self.predictor = self._load_network()
  File "C:\Python27\lib\site-packages\polyglot\tag\base.py", line 109, in _load_network
    self.embeddings = load_embeddings(self.lang, type='cw', normalize=True)
  File "C:\Python27\lib\site-packages\polyglot\decorators.py", line 30, in memoizer
    cache[key] = obj(*args, **kwargs)
  File "C:\Python27\lib\site-packages\polyglot\load.py", line 61, in load_embeddings
    p = locate_resource(src_dir, lang)
  File "C:\Python27\lib\site-packages\polyglot\load.py", line 43, in locate_resource
    if downloader.status(package_id) != downloader.INSTALLED:
  File "C:\Python27\lib\site-packages\polyglot\downloader.py", line 738, in status
    info = self._info_or_id(info_or_id)
  File "C:\Python27\lib\site-packages\polyglot\downloader.py", line 508, in _info_or_id
    return self.info(info_or_id)
  File "C:\Python27\lib\site-packages\polyglot\downloader.py", line 934, in info
    raise ValueError('Package %r not found in index' % id)
ValueError: Package u'embeddings2.iw' not found in index

英国人起作用了,但希伯来人不行。

无论我是否尝试下载包u'embeddings2.iw',我都会得到:

代码语言:javascript
复制
ValueError: Package u'embeddings2.iw' not found in index
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2016-07-15 08:43:14

这样啊,原来是这么回事!

对我来说好像是个虫子。

语言检测将该语言定义为'iw',它是原ISO639希伯来语的语言代码,后来改为'he'text.entities不识别iw代码,因此我将其更改如下:

代码语言:javascript
复制
text2.hint_language_code = 'he'
票数 6
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/38296602

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档