我已经在ES集群上安装了,但是我没有找到关于如何指定正确的分析器的文档。我只需要设置一个标记器和一个过滤器来指定停止词和词干器.
例如在荷兰语中:
"dutch": {
"type": "custom",
"tokenizer": "uax_url_email",
"filter": ["lowercase", "asciifolding", "dutch_stemmer_filter", "dutch_stop_filter"]
}
with:
"dutch_stemmer_filter": {
"type": "stemmer",
"name": "dutch"
},
"dutch_stop_filter": {
"type": "stop",
"stopwords": ["_dutch_"]
}如何配置我的中文分析器?
发布于 2014-09-29 22:12:06
对特定的索引(分析器为'smartcn‘,令牌程序为’smartcn_tokenizer‘)尝试这样做:
PUT /test_chinese
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"default": {
"type": "smartcn"
}
}
}
}
}
}
GET /test_chinese/_analyze?text='叻出色'它应该输出两个令牌(来自插件测试类的测试):
{
"tokens": [
{
"token": "叻",
"start_offset": 1,
"end_offset": 2,
"type": "word",
"position": 2
},
{
"token": "出色",
"start_offset": 2,
"end_offset": 4,
"type": "word",
"position": 3
}
]
}https://stackoverflow.com/questions/26087072
复制相似问题