文章/答案/技术大牛

发布

社区首页 >问答首页 >如何使弹性搜索更加灵活？

问如何使弹性搜索更加灵活？
EN

Stack Overflow用户

提问于 2020-02-12 13:54:29

回答 1查看 352关注 0票数 0

我目前正在使用这个elasticsearch查询：

{
    "_source": [
        "title",
        "bench",
        "id_",
        "court",
        "date"
    ],
    "size": 15,
    "from": 0,
    "query": {
        "bool": {
            "must": {
                "multi_match": {
                    "query": "i r coelho",
                    "fields": [
                        "title",
                        "content"
                    ]
                }
            },
            "filter": [],
            "should": {
                "multi_match": {
                    "query": "i r coelho",
                    "fields": [
                        "title.standard^16",
                        "content.standard"
                    ]
                }
            }
        }
    },
    "highlight": {
        "pre_tags": [
            "<tag1>"
        ],
        "post_tags": [
            "</tag1>"
        ],
        "fields": {
            "content": {}
        }
    }
}

这是正在发生的事情。如果我搜索I.r coelho，它会返回正确的结果。但是，如果我搜索I R coelho (没有句点)，那么它返回一个不同的结果。我怎样才能防止这种情况发生？即使有额外的句点、空格、逗号等，我也希望搜索的行为是一样的。

映射

{
    "courts_2": {
        "mappings": {
            "properties": {
                "author": {
                    "type": "text",
                    "analyzer": "my_analyzer"
                },
                "bench": {
                    "type": "text",
                    "analyzer": "my_analyzer"
                },
                "citation": {
                    "type": "text"
                },
                "content": {
                    "type": "text",
                    "fields": {
                        "standard": {
                            "type": "text"
                        }
                    },
                    "analyzer": "my_analyzer"
                },
                "court": {
                    "type": "text"
                },
                "date": {
                    "type": "text"
                },
                "id_": {
                    "type": "text"
                },
                "title": {
                    "type": "text",
                    "fields": {
                        "standard": {
                            "type": "text"
                        }
                    },
                    "analyzer": "my_analyzer"
                },
                "verdict": {
                    "type": "text"
                }
            }
        }
    }
}

设置：

{
    "courts_2": {
        "settings": {
            "index": {
                "highlight": {
                    "max_analyzed_offset": "19000000"
                },
                "number_of_shards": "5",
                "provided_name": "courts_2",
                "creation_date": "1581094116992",
                "analysis": {
                    "filter": {
                        "my_metaphone": {
                            "replace": "true",
                            "type": "phonetic",
                            "encoder": "metaphone"
                        }
                    },
                    "analyzer": {
                        "my_analyzer": {
                            "filter": [
                                "lowercase",
                                "my_metaphone"
                            ],
                            "tokenizer": "standard"
                        }
                    }
                },
                "number_of_replicas": "1",
                "uuid": "MZSecLIVQy6jiI6YmqOGLg",
                "version": {
                    "created": "7010199"
                }
            }
        }
    }
}

以下是I.R coelho从my analyzer - { "tokens": [ { "token": "IR", "start_offset": 0, "end_offset": 3, "type": "<ALPHANUM>", "position": 0 }, { "token": "KLH", "start_offset": 4, "end_offset": 10, "type": "<ALPHANUM>", "position": 1 } ] }得到的编辑结果

标准分析器：

{
    "tokens": [
        {
            "token": "i.r",
            "start_offset": 0,
            "end_offset": 3,
            "type": "<ALPHANUM>",
            "position": 0
        },
        {
            "token": "coelho",
            "start_offset": 4,
            "end_offset": 10,
            "type": "<ALPHANUM>",
            "position": 1
        }
    ]
}

elasticsearch

elasticsearch-plugin

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-02-13 21:06:04

在搜索I.r coelho和I R coelho时，您有不同的行为，原因是您在相同的字段上使用不同的分析器，即my_analyzer for title和content (must块)，standard (默认)用于title.standard和content.standard (should块)。

这两个分析器生成不同的标记，从而确定在搜索I.r coelho (例如，使用标准分析器的2个令牌)或I R coelho (例如，使用标准分析器的3个令牌)时的不同分数。您可以使用analyze API测试分析器的行为(请参阅弹性文献)。

你必须决定这是否是你想要的行为。

更新(应OP要求澄清后)

_analyze查询的结果证实了这样的假设:这两个分析器导致了不同的分数贡献，随后，根据查询中是否包含符号字符，得到了不同的结果。

如果您不希望查询的结果受到诸如点或上下大小写等符号的影响，则需要重新考虑要应用什么分析器。目前使用的那些将永远无法满足您的需求。如果我正确理解了您的需求，内置分析器应该是适合您的用例的。

简而言之，(1)您应该考虑将standard内置分析器替换为simple内建分析器，(2)您应该决定是否希望您的查询根据不同的分析器(即title和content字段的值上的语音自定义分析器和在各自子字段上的simple字段)应用不同的分数。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/60189958

复制

相似问题

问如何使弹性搜索更加灵活？
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何使弹性搜索更加灵活？EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何使弹性搜索更加灵活？
EN