想要构建一个电子商务网站的自动完成功能,使用完成提示器。
这是我的索引:
PUT myIndex
{
"mappings": {
"_doc" : {
"properties" : {
"suggest" : {
"type" : "completion"
},
"title" : {
"type": "keyword"
},
"category" : {
"type": "keyword"
},
"description" : {
"type": "keyword"
}
}
}
}
}现在,在上传广告时,我希望标题字段用于自动完成,所以我是这样上传文档的:
POST dummy/_doc
{
"title": "Blue asics running shoes",
"category": "sports",
"description": "Nice blue running shoes, size 44 eu",
"suggest": {
"input": "Blue Asics running shoes" // <-- use title
}
}问题是,这样,弹性搜索只能从一开始就匹配字符串.即"Blu“会找到结果,但是"Asic”、"Run“或"Sho”不会返回任何东西.
因此,我需要做的是像这样标记我的输入:
POST dummy/_doc
{
"title": "Blue asics running shoes",
"category": "sports",
"description": "Nice blue running shoes, size 44 eu",
"suggest": {
"input": ["Blue", "Asics", "running", "shoes"] // <-- tokenized title
}
}这会很好..。但我该怎么标记我的场呢?我知道我可以在c#中拆分字符串,但是我是否可以在Elasticsearch/Nest中这样做呢?
发布于 2018-11-01 23:04:36
发布于 2018-11-02 00:39:22
基于Russ上面的答案(选项2)、这个弹性搜索指南和本文件,我得到了以下解决方案:
PUT my_index
{
"settings": {
"analysis": {
"filter": {
"edge_ngram_token_filter": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 10
},
"additional_stop_words": {
"type": "stop",
"stopwords": ["your"]
},
"english_stemmer": {
"type": "stemmer",
"language": "english"
},
"english_possessive_stemmer": {
"type": "stemmer",
"language": "possessive_english"
}
},
"char_filter": {
"my_char_filter": {
"type": "mapping",
"mappings": [
"C# => csharp",
"c# => csharp"
]
}
},
"analyzer": {
"result_suggester_analyzer": {
"type": "custom",
"tokenizer": "standard",
"char_filter": [ "html_strip", "my_char_filter" ],
"filter": [
"english_possessive_stemmer",
"lowercase",
"asciifolding",
"stop",
"additional_stop_words",
"english_stemmer",
"edge_ngram_token_filter",
"unique"
]
}
}
}
}
}查询以测试此解决方案:
POST my_index/_analyze
{
"analyzer": "result_suggester_analyzer",
"text": "C# & SQL are great languages. K2 is the mountaineer's mountain. Your house-décor is à la Mode"
}我会得到这些令牌(NGrams):
cs, csh, csha, cshar, csharp, sq, sql, gr, gre, grea, great, la, lan, lang,
langu, langua, languag, k2, mo, mou, moun, mount, mounta, mountai, mountain,
ho, hou, hous, hous, de, dec, deco, decor, mod, mode这里要注意的是:
stop过滤器,它是默认的英语语言过滤器,并阻塞are, is, the -但不是your。additional_stop_words,它停止了yourenglish & possessive_english词干分析器,这将标记词干:这就是为什么我们有语言标记,而不是语言或语言.还请注意,我们有山,但没有登山。mapped_words_char_filter,它将C#转换为csharp,如果没有这个c#将不是一个有效的令牌.(此设置不会将F#标记化)html_strip,char_filter,它将&转换为&,因为min_gram =2被忽略了。asciifolding令牌过滤器,这就是为什么décor被标记为装饰。这是上面的嵌套代码:
var createIndexResponse = ElasticClient.CreateIndex(IndexName, c => c
.Settings(st => st
.Analysis(an => an
.Analyzers(anz => anz
.Custom("result_suggester_analyzer", cc => cc
.Tokenizer("standard")
.CharFilters("html_strip", "mapped_words_char_filter")
.Filters(new string[] { "english_possessive_stemmer", "lowercase", "asciifolding", "stop", "english_stemmer", "edge_ngram_token_filter", "unique" })
)
)
.CharFilters(cf => cf
.Mapping("mapped_words_char_filter", md => md
.Mappings(
"C# => csharp",
"c# => csharp"
)
)
)
.TokenFilters(tfd => tfd
.EdgeNGram("edge_ngram_token_filter", engd => engd
.MinGram(2)
.MaxGram(10)
)
.Stop("additional_stop_word", sfd => sfd.StopWords(new string[] { "your" }))
.Stemmer("english_stemmer", esd => esd.Language("english"))
.Stemmer("english_possessive_stemmer", epsd => epsd.Language("possessive_english"))
)
)
)
.Mappings(m => m.Map<AdDocument>(d => d.AutoMap())));https://stackoverflow.com/questions/53097275
复制相似问题