我想用elasticsearch实现自动完成,但我做不到。我想要类似的问题,这里。我尝试了建议的答案,但没有成功。我想要下面这样的东西:
我的索引字符串用于例如:
对于输入"develo",我希望作为输出:
对于输入"developpeur",我希望作为输出:
为了输入"suis",我想要输出:
我试着用完成提示器来实现这个目标:
下面是我正在使用的elasticsearch:
"number": "6.2.2",
"build_hash": "10b1edd",
"build_date": "2018-02-16T19:01:30.685723Z",
"build_snapshot": false,
"lucene_version": "7.2.1",
"minimum_wire_compatibility_version": "5.6.0",
"minimum_index_compatibility_version": "5.0.0"制图:
{
"settings": {
"number_of_shards": "1",
"analysis": {
"filter": {
"prefix_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20
},
"ngram_filter": {
"type": "nGram",
"min_gram": "3",
"max_gram": "3"
},
"synonym_filter": {
"type": "synonym",
"synonyms": [
"hackwillbereplacedatindexcreation,hackwillbereplacedatindexcreation"
]
},
"french_stop": {
"type": "stop",
"stopwords": "french"
}
},
"analyzer": {
"word": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"asciifolding",
"french_stop"
],
"char_filter": []
},
"prefix": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"asciifolding",
"synonym_filter",
"prefix_filter"
],
"char_filter": []
},
"ngram_with_synonyms": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"asciifolding",
"synonym_filter",
"ngram_filter"
],
"char_filter": []
},
"ngram": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"asciifolding",
"ngram_filter"
],
"char_filter": []
}
}
}
},
"mappings": {
"training": {
"properties": {
"id": {
"type": "text",
"index": false
},
"label": {
"type": "text",
"index_options": "docs",
"copy_to": "full_label",
"analyzer": "word",
"fields": {
"prefix": {
"type": "text",
"index_options": "docs",
"analyzer": "prefix",
"search_analyzer": "word"
},
"ngram": {
"type": "text",
"index_options": "docs",
"analyzer": "ngram_with_synonyms",
"search_analyzer": "ngram"
}
}
},
"labelSuggest": {
"type": "completion",
"analyzer": "word"
},
}
}
}然后,当我用我的数据创建索引时,我这样做(这是对ES api进行put调用的主体,为此我使用pyhon ):
body = {
"label": r["title"],
"labelSuggest": {
"input": r["title"].ngrams()
},
"weight": 1.
}r" title ".ngrams()获取标题的所有ngram。例如:“发展研究生物技术”:“发展”、“研究”、“生物技术”、“发展研究”、“生物技术研究”和“发展研究生物技术”。
那么,给建议者打电话,我会:
POST http://localhost:9200/training/_search?pretty
{
"suggest": {
"labelSuggest": {
"text": "developpeur",
"completion": {
"field": "labelSuggest",
"skip_duplicates": true
}
}
}
}结果是:
{
"text": "développement",
"_index": "activity_20180518092449",
"_type": "activity",
"_id": "2031ce8b-6589-3270-afdf-7901aa21efa1",
"_score": 1,
"_source": {
"id": "2031ce8b-6589-3270-afdf-7901aa21efa1",
"name": "development research biotech",
"labelSuggest": [
"development",
"research",
"biotech",
"development research",
"research biotech",
"development research biotech"
]
}但我想要给我的东西:“发展”、“发展研究”和“发展研究生物技术”(假设我们只有这份文件作为投入)。
我正在做的映射/查询有什么问题?这样做对吗?我希望我的问题很清楚。我找了很多遍都是徒劳的。
提前感谢
发布于 2018-05-23 11:08:17
首先,恩格拉姆不会照你说的做。
这是:
"ngram_filter": {
"type": "nGram",
"min_gram": "3",
"max_gram": "3"
},将从“发展爪哇”->开发,前夕,维尔,elo .等等。
查看这里的文档:Ngram托卡器
第二..。对于您想要的结果,我将只使用一个自定义分析器,它具有过滤器"icu_folding“和"engram”以及一个空白标记器。现在我将从2开始,最多从20-25开始。
这将从"developpeur“-> de,dev,deve,devel,develo,developp,developpe,devellopeu,developper .生成这样的标记列表。等等。
然后你在这个领域做一个简单的术语搜索。如果它是自动完成的下拉列表,您将在键入时返回记录。希望我能理解你的问题,希望这会有所帮助。
更新:尝试使用以下内容:
"suggester": {
"type": "custom",
"tokenizer": "whitespace",
"filter": ["my_ngram_filter", "icu_folding"],
"char_filter": []
}
"my_ngram_filter" is: "my_ngram_filter": {
"type": "edge_ngram",
"min_gram": "2",
"max_gram": "20"
}那么,字段上的映射应该如下所示
"labelSuggest": {
"type": "text",
"analyzer": "suggester"
}然后做一个简单的搜索
{
"query": {
"term": {
"labelSuggest": "dev"
}
}
}https://stackoverflow.com/questions/50485068
复制相似问题