我试图使用智能中文分析器分析Elasticsearch中的文档,但是,Elasticsearch返回的不是被分析的中文字符,而是这些字符的unicode。例如:
PUT /test_chinese
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"default": {
"type": "smartcn"
}
}
}
}
}
}
GET /test_chinese/_analyze?text='我说世界好!'我希望得到每个中文字符,但我得到了:
{
"tokens": [
{
"token": "25105",
"start_offset": 3,
"end_offset": 8,
"type": "word",
"position": 4
},
{
"token": "35828",
"start_offset": 11,
"end_offset": 16,
"type": "word",
"position": 8
},
{
"token": "19990",
"start_offset": 19,
"end_offset": 24,
"type": "word",
"position": 12
},
{
"token": "30028",
"start_offset": 27,
"end_offset": 32,
"type": "word",
"position": 16
},
{
"token": "22909",
"start_offset": 35,
"end_offset": 40,
"type": "word",
"position": 20
}
]
}你知道发生了什么事吗?
谢谢!
发布于 2015-12-15 22:12:49
我找到了关于我的问题的问题。似乎在Sense中有一个bug。在这里你可以找到与Zachary童的对话,Elasticsearch开发人员:https://discuss.elastic.co/t/smart-chinese-analysis-returns-unicodes-instead-of-chinese-tokens/37133这是发现的错误的罚单:https://github.com/elastic/sense/issues/88
https://stackoverflow.com/questions/34266236
复制相似问题