我有一个自定义分析器索引的字段,配置如下
"COMPNAYNAME" : {
"type" : "text",
"analyzer" : "textAnalyzer"
}
"textAnalyzer" : {
"filter" : [
"lowercase"
],
"char_filter" : [ ],
"type" : "custom",
"tokenizer" : "ngram_tokenizer"
}
"tokenizer" : {
"ngram_tokenizer" : {
"type" : "ngram",
"min_gram" : "2",
"max_gram" : "3"
}
}当我搜索文本"ikea“时,我得到了以下结果
查询:
GET company_info_test_1/_search
{
"query": {
"match": {
"COMPNAYNAME": {"query": "ikea"}
}
}
}休眠是结果,
1.mikea
2.likeable
3.maaikeart
4.likeables
5.ikea b.v. <------
6.likeachef
7.ikea breda <------
8.bernikeart
9.ikea duiven
10.mikea media你能帮助我什么是最好的索引方式,如果我必须搜索与精确匹配以及冒泡。
提前谢谢。
发布于 2020-09-16 18:28:24
您可以将ngram tokenizer与
"search_analyzer": "standard"一起使用,请参阅此处了解有关search_analyzer的更多信息
正如@EvaldasBuinauskas指出的那样,如果您希望仅从开头而不是从中间生成令牌,您也可以在此处使用 。
添加一个包含索引数据、映射、搜索查询和结果的工作示例
索引数据:
{ "title": "ikea b.v."}
{ "title" : "mikea" }
{ "title" : "maaikeart"}索引映射
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"type": "ngram",
"min_gram": 2,
"max_gram": 10,
"token_chars": [
"letter",
"digit"
]
}
}
},
"max_ngram_diff": 50
},
"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "my_analyzer",
"search_analyzer": "standard"
}
}
}
}搜索查询:
{
"query": {
"match" : {
"title" : "ikea"
}
}
}搜索结果:
"hits": [
{
"_index": "normal",
"_type": "_doc",
"_id": "4",
"_score": 0.1499838, <-- note this
"_source": {
"title": "ikea b.v."
}
},
{
"_index": "normal",
"_type": "_doc",
"_id": "1",
"_score": 0.13562363, <-- note this
"_source": {
"title": "mikea"
}
},
{
"_index": "normal",
"_type": "_doc",
"_id": "3",
"_score": 0.083597526,
"_source": {
"title": "maaikeart"
}
}
]https://stackoverflow.com/questions/63917546
复制相似问题