文章/答案/技术大牛

发布

社区首页 >问答首页 >Elasticsearch:找到子字符串匹配，精确匹配，并且只有在当前匹配的情况下

问Elasticsearch:找到子字符串匹配，精确匹配，并且只有在当前匹配的情况下
EN

Stack Overflow用户

提问于 2020-08-03 21:30:17

回答 2查看 658关注 0票数 0

我希望执行这样的查询，以便查询显示输出当且仅当查询中的所有单词都以字符串或查询的形式出现在给定字符串中，例如-

让text =“垃圾桶”

所以如果我质疑

“装束”

它应该返回“垃圾桶”

如果我质疑

“垃圾卡”

它应该返回“垃圾桶”

但如果我质疑

“垃圾b”

它不应该归还任何东西

我试过使用子字符串和match，但他们都没有为我完成任务。

elasticsearch

回答 2

Stack Overflow用户

发布于 2020-08-04 04:12:49

我想你想做一个前缀查询。请尝试使用以下前缀查询

GET /test_index/_search
{
  "query": {
    "prefix": {
      "my_keyword": {
        "value": "garbage b"
      }
    }
  }
}

然而，这种前缀查询的性能并不好。

您可以使用自定义的前缀分析器来尝试以下查询。首先，创建一个新的索引：

PUT /test_index
{
  "settings": {
    "index": {
      "number_of_shards": "1",
      "analysis": {
        "filter": {
          "autocomplete_filter": {
            "type": "edge_ngram",
            "min_gram": "1",
            "max_gram": "20"
          }
        },
        "analyzer": {
          "autocomplete": {
            "filter": [
              "lowercase",
              "autocomplete_filter"
            ],
            "type": "custom",
            "tokenizer": "keyword"
          }
        }
      },
      "number_of_replicas": "1"
    }
  },
  "mappings": {
    "properties": {
      "my_text": {
        "analyzer": "autocomplete",
        "type": "text"
      },
      "my_keyword": {
        "type": "keyword"
      }
    }
  }
}

第二，在此索引中插入数据：

PUT /test_index/_doc/1
{
  "my_text": "garbage can",
  "my_keyword": "garbage can"
}

使用“垃圾c”查询

GET /test_index/_search
{
  "query": {
    "term": {
      "my_text": "garbage c"
    }
  }
}

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.45802015,
    "hits" : [
      {
        "_index" : "test_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.45802015,
        "_source" : {
          "my_text" : "garbage can",
          "my_keyword" : "garbage can"
        }
      }
    ]
  }
}

查询“垃圾b”

GET /test_index/_search
{
  "query": {
    "term": {
      "my_text": "garbage b"
    }
  }
}

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}

如果不想执行前缀查询，可以尝试以下通配符查询。请记住，性能是不好的，你也可以尝试使用立方体分析器来优化它。

GET /test_index/_search
{
  "query": {
    "wildcard": {
      "my_keyword": {
        "value": "*garbage c*"
      }
    }
  }
}

新编辑部件

我不确定我是否想要你这次真的想要..。

无论如何，请尝试使用以下_mapping和查询：

1.创建索引

PUT /test_index
{
  "settings": {
    "index": {
      "max_ngram_diff": 50,
      "number_of_shards": "1",
      "analysis": {
        "filter": {
          "autocomplete_filter": {
            "type": "ngram",
            "min_gram": 1,
            "max_gram": 51,
            "token_chars": [
              "letter",
              "digit"
            ]
          }
        },
        "analyzer": {
          "autocomplete": {
            "filter": [
              "lowercase",
              "autocomplete_filter"
            ],
            "type": "custom",
            "tokenizer": "keyword"
          }
        }
      },
      "number_of_replicas": "1"
    }
  },
  "mappings": {
    "properties": {
      "my_text": {
        "analyzer": "autocomplete",
        "type": "text"
      },
      "my_keyword": {
        "type": "keyword"
      }
    }
  }
}

2.插入一些smaple

PUT /test_index/_doc/1
{
  "my_text": "test garbage can",
  "my_keyword": "test garbage can"
}

PUT /test_index/_doc/2
{
  "my_text": "garbage",
  "my_keyword": "garbage"
}

3.查询

GET /test_index/_search
{
  "query": {
    "term": {
      "my_text": "bage c"
    }
  }
}

请注意：

此索引只支持最大长度为50的字符串。否则，需要修改max_ngram_diff、min_gram和max_gram。
建立反转指数需要大量的时间。

票数 0

Stack Overflow用户

发布于 2020-08-04 05:40:00

您可以使用边缘N标记程序对数据进行索引。您还可以在最新的7.8版本中使用自定义token_chars！

查看文档以获得更多详细信息：https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-edgengram-tokenizer.html

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/63237202

复制

相似问题

问Elasticsearch:找到子字符串匹配，精确匹配，并且只有在当前匹配的情况下
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Elasticsearch:找到子字符串匹配，精确匹配，并且只有在当前匹配的情况下EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Elasticsearch:找到子字符串匹配，精确匹配，并且只有在当前匹配的情况下
EN