我有一个包含公司名称的索引、公司的缩写和公司工作的描述(索引模式如下)。本文件中一个元素的例子是:
{
"abbreviation": "APPL",
"name": "Apple",
"description": "Computer software and hardware"
}通常,用户在搜索文档时将输入abbreviation。有时,他们可能不正确地输入这一点,而elasticsearch在这种情况下效果很好。然而,大多数时候,用户都会准确地输入缩略语,当他们在响应的顶部得到最好的匹配时,一些分数较低(大于0)的垃圾就会回来。我尝试过在查询中使用min_score,但是很难选择这个参数,因为分数波动很大。
是否有一种方法可以消除与abbreviation字段不完全匹配但仍有模糊匹配作为备份的文档,以防未找到精确匹配或用户搜索其他字段(例如name和description)?
以下是几个例子:
AAPL会产生3个结果,这两个结果与查询完全匹配,因此有相当高的分数,但ADP仍然有点相似,但显然不是用户搜索的结果。{
"abbreviation": "APPL",
"name": "Apple, Inc.",
"description": "Computer software and hardware"
},
{
"abbreviation": "APPL",
"name": "Apple, Inc.",
"description": "Computer software and hardware"
},
{
"abbreviation": "ADP",
"name": "Automatic Data Processing, Inc",
"description": "Computer software and hardware"
}Apple时,我们再次得到了最重要的几个条目是超级相关的,但随后出现了一些其他公司名称。{
"abbreviation": "APPL",
"name": "Apple, Inc.",
"description": "Computer software and hardware"
},
{
"abbreviation": "APPL",
"name": "Apple, Inc.",
"description": "Computer software and hardware"
},
{
"abbreviation": "CSCO",
"name": "AppDynamics (Cisco subsidiary)",
"description": "Computer software"
}文档的模式:
{
"settings": {
"index": {
"requests.cache.enable": true
}
},
"mappings": {
"properties": {
"abbreviation_and_name": {
"type": "text",
"boost": 2
},
"abbreviation": { "type": "text", "copy_to": "abbreviation_and_name", "boost": 20 },
"name": { "type": "text", "copy_to": "abbreviation_and_name" },
"description": { "type": "text" }
}
}
}发布于 2020-12-18 07:22:49
首先,我可能会问为什么在搜索AAPL时应该带回以下文档:
{
"abbreviation": "ADP",
"name": "Automatic Data Processing, Inc",
"description": "Computer software and hardware"
}其次,我建议从索引映射中删除升压标准,建议在查询级别进行提升。
但总的来说,我相信您可能只是想要一个OR查询:
{
"query": {
"bool": {
"should": [
{
"match": {
"abbreviation": {
"query": "AAPL",
"boost": 2
}
}
},
{
"multi_match": {
"query": "AAPL",
"fields": ["name", "description"],
"fuzziness": "AUTO"
}
}
]
}
}
}这可能不会像您所描述的那样产生确切的结果,但我认为这对于您的用例来说应该很好。
https://stackoverflow.com/questions/65323561
复制相似问题