我想我应该用一个例子来解释我的问题:
假设我已经用同义词分析器创建了索引,并且我声明"laptop“、"phone”和"tablet“是可以概括为”mobile“的相似单词:
PUT synonym
{
"settings": {
"index": {
"number_of_shards": 3,
"number_of_replicas": 2,
"analysis": {
"analyzer": {
"synonym": {
"tokenizer": "whitespace",
"filter": [
"synonym"
]
}
},
"filter": {
"synonym": {
"type": "synonym",
"synonyms": [
"phone, tablet, laptop => mobile"
]
}
}
}
}
},
"mappings": {
"synonym" : {
"properties" : {
"field1" : {
"type" : "text",
"analyzer": "synonym",
"search_analyzer": "synonym"
}
}
}
}
}现在我正在创建一些文档:
PUT synonym/synonym/1
{
"field1" : "phone"
}
PUT synonym/synonym/2
{
"field1" : "tablet"
}
PUT synonym/synonym/3
{
"field1" : "laptop"
}现在,当我匹配laptop、tablet或phone的查询时,结果总是:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0.2876821,
"hits": [
{
"_index": "synonym",
"_type": "synonym",
"_id": "2",
"_score": 0.2876821,
"_source": {
"field1": "tablet"
}
},
{
"_index": "synonym",
"_type": "synonym",
"_id": "1",
"_score": 0.18232156,
"_source": {
"field1": "phone"
}
},
{
"_index": "synonym",
"_type": "synonym",
"_id": "3",
"_score": 0.18232156,
"_source": {
"field1": "laptop"
}
}
]
}
}你可以看到,即使我搜索laptop,tablet的分数也总是更高。
我知道这是因为我把它们声明为相似的单词。
但是,我正在尝试弄清楚如何查询才能使带有搜索词的文档出现在结果列表中相似单词之前的第一个位置。
这可以通过boosting来完成,但必须有一种更简单的方法。
发布于 2018-01-04 19:25:15
Multi-fields可以拯救你。使用两种方法对field1进行索引,一种是使用同义词分析器,另一种是使用标准分析器。现在,您可以简单地使用布尔-应该查询来为field1 (同义词)和field1.raw (标准)上的匹配添加分数。因此,您的映射应该如下所示:
PUT synonym
{
"settings": {
"index": {
"number_of_shards": 3,
"number_of_replicas": 2,
"analysis": {
"analyzer": {
"synonym": {
"tokenizer": "whitespace",
"filter": [
"synonym"
]
}
},
"filter": {
"synonym": {
"type": "synonym",
"synonyms": [
"phone, tablet, laptop => mobile"
]
}
}
}
}
},
"mappings": {
"synonym": {
"properties": {
"field1": {
"type": "text",
"analyzer": "synonym",
"search_analyzer": "synonym",
"fields": {
"raw": {
"type": "text",
"analyzer": "standard"
}
}
}
}
}
}
}您可以使用以下命令进行查询:
GET synonyms/_search?search_type=dfs_query_then_fetch
{
"query": {
"bool": {
"should": [
{
"match": {
"field1": "tablet"
}
},
{
"match": {
"field1.raw": "tablet"
}
}
]
}
}
}注意:我用过search_type=dfs_query_then_fetch。由于您在3个分片上进行测试,并且只有很少的文档,因此您得到的分数并不是它们应该得到的分数。这是因为频率是按分片计算的。您可以在测试时使用dfs_query_then_fetch,但不鼓励在生产中使用它。请参阅:https://www.elastic.co/blog/understanding-query-then-fetch-vs-dfs-query-then-fetch
https://stackoverflow.com/questions/48090911
复制相似问题