首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Elasticsearch - River和nGrams

Elasticsearch - River和nGrams
EN

Stack Overflow用户
提问于 2012-10-27 19:49:43
回答 1查看 700关注 0票数 1

我在河插件中使用ES,因为我使用的是couchDB,我试图在查询时使用nGrams。我基本上已经完成了我所需要的一切,除了当某人输入一个空格时,查询不能正常工作。这是因为ES标记查询的每个元素,将其按空格分割。

我需要做的是:

  • 查询字符串中的部分文本: 查询:"Hello“回复:" Hello,Hello”/排除"Hello,World,Word“
  • 根据我指定的标准对结果进行排序;
  • 对案件不敏感。

下面是我所做的,下面是一个问题:How to search for a part of a word with ElasticSearch

代码语言:javascript
复制
curl -X PUT  'localhost:9200/_river/myDB/_meta' -d '
{
"type" : "couchdb",
"couchdb" : {
    "host" : "localhost",
    "port" : 5984,
    "db" : "myDB",
    "filter" : null
},
   "index" : {
    "index" : "myDB",
    "type" : "myDB",
    "bulk_size" : "100",
    "bulk_timeout" : "10ms",
    "analysis" : {
               "index_analyzer" : {
                          "my_index_analyzer" : {
                                        "type" : "custom",
                                        "tokenizer" : "standard",
                                        "filter" : ["lowercase", "mynGram"]
                          }
               },
               "search_analyzer" : {
                          "my_search_analyzer" : {
                                        "type" : "custom",
                                        "tokenizer" : "standard",
                                        "filter" : ["standard", "lowercase", "mynGram"]
                          }
               },
               "filter" : {
                        "mynGram" : {
                                   "type" : "nGram",
                                   "min_gram" : 2,
                                   "max_gram" : 50
                        }
               }
    }
}
}
'

然后我将为排序添加一个映射:

代码语言:javascript
复制
curl -s -XGET 'localhost:9200/myDB/myDB/_mapping' 
{
"sorting": {
       "Title": {
            "fields": {
                "Title": {
                     "type": "string"
                  }, 
                "untouched": {
                    "include_in_all": false, 
                    "index": "not_analyzed", 
                    "type": "string"
                    }
               }, 
              "type": "multi_field"
         },
        "Year": {
              "fields": {
                   "Year": {
                       "type": "string"
                       }, 
                       "untouched": {
                           "include_in_all": false, 
                           "index": "not_analyzed", 
                           "type": "string"
                         }
                     }, 
                    "type": "multi_field"
        }
     }
    }
   }'

我已经添加了所有的信息,我使用的只是为了完整。无论如何,对于这个设置,我认为它应该可以工作,每当我试图获得一些结果时,仍然会使用这个空间来分割我的查询,例如:

代码语言:javascript
复制
  http://localhost:9200/myDB/myDB/_search?q=Title:(Hello%20Wor)&pretty=true

返回任何包含"Hello“和"Wor”的内容(我通常不使用括号,但我在一个示例中看到了它们,但结果似乎非常相似)。

任何帮助都是真正感谢的,因为这让我非常苦恼。

更新:最后,我意识到我不需要nGram。一个正常的索引就可以了;只需用‘AND’替换查询的空格就可以了。

示例:

代码语言:javascript
复制
 Query: "Hello World"  --->  Replaced as "(*Hello And World*)"
EN

回答 1

Stack Overflow用户

发布于 2012-10-27 20:32:19

现在没有弹性搜索装置,但这可能是医生的帮助吗?

http://www.elasticsearch.org/guide/reference/query-dsl/match-query.html

代码语言:javascript
复制
Types of Match Queries

boolean

The default match query is of type boolean. It means that the text provided is analyzed and the analysis process constructs a boolean query from the provided text. The operator flag can be set to or or and to control the boolean clauses (defaults to or).

The analyzer can be set to control which analyzer will perform the analysis process on the text. It default to the field explicit mapping definition, or the default search analyzer.

fuzziness can be set to a value (depending on the relevant type, for string types it should be a value between 0.0 and 1.0) to constructs fuzzy queries for each term analyzed. The prefix_length and max_expansions can be set in this case to control the fuzzy process. If the fuzzy option is set the query will use constant_score_rewrite as its rewrite method the rewrite parameter allows to control how the query will get rewritten.

Here is an example when providing additional parameters (note the slight change in structure, message is the field name):

{
    "match" : {
        "message" : {
            "query" : "this is a test",
            "operator" : "and"
        }
    }
}
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/13103605

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档