文章/答案/技术大牛

发布

社区首页 >问答首页 >模糊匹配率

问模糊匹配率
EN

Stack Overflow用户

提问于 2019-10-14 01:56:56

回答 1查看 597关注 0票数 0

当我按如下方式查询模糊匹配时，elasticsearch仍然只返回_score。但我期望的是基于模糊算法的匹配百分比。我认为这是一个简单的可配置的东西，但我找不到任何它，因为它是常见的模糊匹配的结果显示匹配百分比。

怎么可能做到呢？或者这并不是elasticsearch中常见的“实践”？但我在大多数用户界面中发现的模糊匹配的匹配百分比得分。

 "query": { 
        "fuzzy": {
                "name": {
                    "value": "Shahid"
                }
            }
      }

响应：

"hits" : [{
    "_index" : "users",
    "_type" : "user",
    "_id" : "5sadsadsaddas",
    "_score" : 0.11127616,
    "fuzzyMatchPercentage": 100% // I expect something like this here
    "_source" : {
      "name" : "Shahid",
      "email" : "shahid@codeforgeek.com",
      "city" : "mumbai"
    }
  },

elasticsearch

elasticsearch-query

回答 1

Stack Overflow用户

发布于 2019-10-14 04:09:15

正如评论中提到的，这不是fuzzy-query在Elasticsearch中的工作方式。默认情况下，搜索结果按分数降序排序，其中分数表示文档与特定查询的匹配程度。模糊性方面被合并到该分数的计算中:查询匹配越精确/模糊程度越低，分数就越高。您可以通过请求详细的分数解释来验证这一点(在Elasticsearch v7.x中，模糊性方面被合并到boost-factor的计算中)。请看下面的示例：

1.索引两个示例文档(一个具有正确的名称，一个具有拼写错误的名称)

POST fuzzy/_bulk
{"index":{"_id":1}}
{"name": "Shahid"}
{"index":{"_id":2}}
{"name": "Shahib"}

Shahid 2.使用 fuzzy**-query搜索“”**

GET fuzzy/_search
{
  "explain": true, 
  "query": {
    "fuzzy": {
      "name": {
        "value": "Shahid"
      }
    }
  }
}

3.两个匹配文档的得分和解释子句

对于拼写正确的文档("Shahid")：

    "_explanation" : {
      "value" : 0.57762265,
      "description" : "sum of:",
      "details" : [
        {
          "value" : 0.57762265,
          "description" : "weight(name:shahid in 0) [PerFieldSimilarity], result of:",
          "details" : [
            {
              "value" : 0.57762265,
              "description" : "score(freq=1.0), product of:",
              "details" : [
                {
                  "value" : 1.8333334,
                  "description" : "boost",
                  "details" : [ ]
                },
                {
                  "value" : 0.6931472,
                  "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                  "details" : [
                    {
                      "value" : 1,
                      "description" : "n, number of documents containing term",
                      "details" : [ ]
                    },
                    {
                      "value" : 2,
                      "description" : "N, total number of documents with field",
                      "details" : [ ]
                    }
                  ]
                },
                {
                  "value" : 0.45454544,
                  "description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                  "details" : [
                    {
                      "value" : 1.0,
                      "description" : "freq, occurrences of term within document",
                      "details" : [ ]
                    },
                    {
                      "value" : 1.2,
                      "description" : "k1, term saturation parameter",
                      "details" : [ ]
                    },
                    {
                      "value" : 0.75,
                      "description" : "b, length normalization parameter",
                      "details" : [ ]
                    },
                    {
                      "value" : 1.0,
                      "description" : "dl, length of field",
                      "details" : [ ]
                    },
                    {
                      "value" : 1.0,
                      "description" : "avgdl, average length of field",
                      "details" : [ ]
                    }
                  ]
                }
              ]
            }
          ]
        }
      ]
    }

对于拼写错误的文档("Shahib")：

    "_explanation" : {
      "value" : 0.46209806,
      "description" : "sum of:",
      "details" : [
        {
          "value" : 0.46209806,
          "description" : "weight(name:shahib in 1) [PerFieldSimilarity], result of:",
          "details" : [
            {
              "value" : 0.46209806,
              "description" : "score(freq=1.0), product of:",
              "details" : [
                {
                  "value" : 1.4666666,
                  "description" : "boost",
                  "details" : [ ]
                },
                {
                  "value" : 0.6931472,
                  "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                  "details" : [
                    {
                      "value" : 1,
                      "description" : "n, number of documents containing term",
                      "details" : [ ]
                    },
                    {
                      "value" : 2,
                      "description" : "N, total number of documents with field",
                      "details" : [ ]
                    }
                  ]
                },
                {
                  "value" : 0.45454544,
                  "description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                  "details" : [
                    {
                      "value" : 1.0,
                      "description" : "freq, occurrences of term within document",
                      "details" : [ ]
                    },
                    {
                      "value" : 1.2,
                      "description" : "k1, term saturation parameter",
                      "details" : [ ]
                    },
                    {
                      "value" : 0.75,
                      "description" : "b, length normalization parameter",
                      "details" : [ ]
                    },
                    {
                      "value" : 1.0,
                      "description" : "dl, length of field",
                      "details" : [ ]
                    },
                    {
                      "value" : 1.0,
                      "description" : "avgdl, average length of field",
                      "details" : [ ]
                    }
                  ]
                }
              ]
            }
          ]
        }
      ]
    }

4.结论不幸的是，没有详细解释提升因子(Elasticsearch问题)，但从示例中可以看出，这是关于两个文档评分的唯一区别：

boost

Shahid：_score: 0.57762265 / boost： _score: 0.46209806 / boost: 1.4666666

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/58366303

复制

相似问题

问模糊匹配率
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问模糊匹配率EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问模糊匹配率
EN