当我按如下方式查询模糊匹配时,elasticsearch仍然只返回_score。但我期望的是基于模糊算法的匹配百分比。我认为这是一个简单的可配置的东西,但我找不到任何它,因为它是常见的模糊匹配的结果显示匹配百分比。
怎么可能做到呢?或者这并不是elasticsearch中常见的“实践”?但我在大多数用户界面中发现的模糊匹配的匹配百分比得分。
"query": {
"fuzzy": {
"name": {
"value": "Shahid"
}
}
}响应:
"hits" : [{
"_index" : "users",
"_type" : "user",
"_id" : "5sadsadsaddas",
"_score" : 0.11127616,
"fuzzyMatchPercentage": 100% // I expect something like this here
"_source" : {
"name" : "Shahid",
"email" : "shahid@codeforgeek.com",
"city" : "mumbai"
}
},发布于 2019-10-14 04:09:15
正如评论中提到的,这不是fuzzy-query在Elasticsearch中的工作方式。默认情况下,搜索结果按分数降序排序,其中分数表示文档与特定查询的匹配程度。模糊性方面被合并到该分数的计算中:查询匹配越精确/模糊程度越低,分数就越高。您可以通过请求详细的分数解释来验证这一点(在Elasticsearch v7.x中,模糊性方面被合并到boost-factor的计算中)。请看下面的示例:
1.索引两个示例文档(一个具有正确的名称,一个具有拼写错误的名称)
POST fuzzy/_bulk
{"index":{"_id":1}}
{"name": "Shahid"}
{"index":{"_id":2}}
{"name": "Shahib"}Shahid 2.使用 fuzzy**-query搜索“”**
GET fuzzy/_search
{
"explain": true,
"query": {
"fuzzy": {
"name": {
"value": "Shahid"
}
}
}
}3.两个匹配文档的得分和解释子句
对于拼写正确的文档("Shahid"):
"_explanation" : {
"value" : 0.57762265,
"description" : "sum of:",
"details" : [
{
"value" : 0.57762265,
"description" : "weight(name:shahid in 0) [PerFieldSimilarity], result of:",
"details" : [
{
"value" : 0.57762265,
"description" : "score(freq=1.0), product of:",
"details" : [
{
"value" : 1.8333334,
"description" : "boost",
"details" : [ ]
},
{
"value" : 0.6931472,
"description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
"details" : [
{
"value" : 1,
"description" : "n, number of documents containing term",
"details" : [ ]
},
{
"value" : 2,
"description" : "N, total number of documents with field",
"details" : [ ]
}
]
},
{
"value" : 0.45454544,
"description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
"details" : [
{
"value" : 1.0,
"description" : "freq, occurrences of term within document",
"details" : [ ]
},
{
"value" : 1.2,
"description" : "k1, term saturation parameter",
"details" : [ ]
},
{
"value" : 0.75,
"description" : "b, length normalization parameter",
"details" : [ ]
},
{
"value" : 1.0,
"description" : "dl, length of field",
"details" : [ ]
},
{
"value" : 1.0,
"description" : "avgdl, average length of field",
"details" : [ ]
}
]
}
]
}
]
}
]
}对于拼写错误的文档("Shahib"):
"_explanation" : {
"value" : 0.46209806,
"description" : "sum of:",
"details" : [
{
"value" : 0.46209806,
"description" : "weight(name:shahib in 1) [PerFieldSimilarity], result of:",
"details" : [
{
"value" : 0.46209806,
"description" : "score(freq=1.0), product of:",
"details" : [
{
"value" : 1.4666666,
"description" : "boost",
"details" : [ ]
},
{
"value" : 0.6931472,
"description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
"details" : [
{
"value" : 1,
"description" : "n, number of documents containing term",
"details" : [ ]
},
{
"value" : 2,
"description" : "N, total number of documents with field",
"details" : [ ]
}
]
},
{
"value" : 0.45454544,
"description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
"details" : [
{
"value" : 1.0,
"description" : "freq, occurrences of term within document",
"details" : [ ]
},
{
"value" : 1.2,
"description" : "k1, term saturation parameter",
"details" : [ ]
},
{
"value" : 0.75,
"description" : "b, length normalization parameter",
"details" : [ ]
},
{
"value" : 1.0,
"description" : "dl, length of field",
"details" : [ ]
},
{
"value" : 1.0,
"description" : "avgdl, average length of field",
"details" : [ ]
}
]
}
]
}
]
}
]
}4.结论不幸的是,没有详细解释提升因子(Elasticsearch问题),但从示例中可以看出,这是关于两个文档评分的唯一区别:
boost
https://stackoverflow.com/questions/58366303
复制相似问题