文章/答案/技术大牛

发布

社区首页 >问答首页 >在Elasticsearch 5上使用过滤器获取嵌套文档

问在Elasticsearch 5上使用过滤器获取嵌套文档
EN

Stack Overflow用户

提问于 2017-02-15 17:14:06

回答 2查看 882关注 0票数 2

我在ES 5中映射了以下文档：

{
   "appName" : {
      "mappings" : {
         "market_audit" : {
            "properties" : {
               "generation_date": {
                  "type": "date"
               },
               "customers" : {
                  "type" : "nested",
                  "properties" : {
                     "customer_id" : {
                        "type" : "integer"
                     },
   [... other properties ...]
}

"customers“节点中的几个条目可能具有相同的customer_id，而且我只试图检索具有特定customer_id的条目(即。与顶级文档的"generation_date“一起使用(只有最新的文档需要处理)。

我提出了以下查询：

{
  "query": {},
  "sort": [
    { "generation_date": "desc" }
  ],
  "size": 1,
  "aggregations": {
    "nested": {
      "nested": {
        "path": "customers"
      },
      "aggregations": {
        "filter": {
          "filter": {
            "match": {
              "customers.customer_id": {
                "query": "1"
              }
            }
          },
          "aggregations": {
            "tophits_agg": {
              "top_hits": {}
            }
          }
        }
      }
    }
  }
}

这个查询获得了我感兴趣的数据，它位于“聚合”数组(以及包含整个文档的"hits“数组中)。这里的问题是，我使用的框架(ONGR的ElasticSearch包以及DSL，使用Symfony3)每次尝试访问没有桶可用的实际数据时都会抱怨。

我已经阅读了ES文档，但是无法想出一个可以使用的查询来添加存储桶。我肯定我错过了什么，一点点帮助会更受欢迎。如果您知道如何适当地修改查询，我想我可以想出PHP代码来生成它。

编辑:由于这个问题得到了一些观点而没有答案(而且我仍然被困住了)，所以我会满足于任何允许我从最新生成的文档(根据“customer_id”字段)检索特定“客户”(使用generation_date)信息的查询。我给出的查询正是我能够想到的，我确信有更好的方法可以做到这一点。也许有什么建议？

编辑2:这是发送给ES的数据：

{
    "index": {
    "_type": "market_data_audit_document"
    }
}
{
    "customers": [
    {
        "customer_id": 1,
        "colocation_name": "colo1",
        "colocation_id": 26,
        "device_name": "device 1",
        "channels": [
        {
            "name": "channel1-5",
            "multicast":"1.2.1.5",
            "sugar_state":4,
            "network_state":1
        }
        ]
    },
    {
        "customer_id":2,
        "colocation_name":"colo2",
        "colocation_id":27,
        "device_name":"device 2",
        "channels": [
        {
            "name":"channel2-5",
            "multicast":"1.2.2.5",
            "sugar_state":4,
            "network_state":1
        }
        ]
    },
    {
        "customer_id":3,
        "colocation_name":"colo3",
        "colocation_id":28,
        "device_name":"device 3",
        "channels": [
        {
            "name":"channel3-5",
            "multicast":"1.2.3.5",
            "sugar_state":4,
            "network_state":1
        }
        ]
    },
    {
        "customer_id":4,
        "colocation_name":"colo4",
        "colocation_id":29,
        "device_name":"device 4"
        ,"channels": [
        {
            "name":"channel4-5",
            "multicast":"1.2.4.5",
            "sugar_state":4,
            "network_state":1
        }
        ]
    },
    {
        "customer_id":5,
        "colocation_name":"colo5",
        "colocation_id":30,
        "device_name":"device 5",
        "channels": [
        {
            "name":"channel5-5",
            "multicast":"1.2.5.5",
            "sugar_state":4,
            "network_state":1
        }
        ]
    }
    ],
    "generation_date":"2017-02-27T10:55:45+0100"
}

不幸的是，当我试图发送这篇文章中列出的查询时，我发现聚合并不像我期望的那样:它返回“好”数据，而是从中返回存储的文档！下面是一个输出示例：

{
   "timed_out" : false,
   "took" : 60,
   "hits" : {
      "total" : 2,
      "hits" : [
         {
            "_source" : {
               "customers" : [
                  {
                     "colocation_id" : 26,
                     "channels" : [
                        {
                           "name" : "channel1-5",
                           "sugar_state" : 4,
                           "network_state" : 1,
                           "multicast" : "1.2.1.5"
                        }
                     ],
                     "customer_id" : 1,
                     "colocation_name" : "colo1",
                     "device_name" : "device 1"
                  },
                  {
                     "colocation_id" : 27,
                     "channels" : [
                        {
                           "multicast" : "1.2.2.5",
                           "network_state" : 1,
                           "name" : "channel2-5",
                           "sugar_state" : 4
                        }
                     ],
                     "customer_id" : 2,
                     "device_name" : "device 2",
                     "colocation_name" : "colo2"
                  },
                  {
                     "device_name" : "device 3",
                     "colocation_name" : "colo3",
                     "customer_id" : 3,
                     "channels" : [
                        {
                           "multicast" : "1.2.3.5",
                           "network_state" : 1,
                           "sugar_state" : 4,
                           "name" : "channel3-5"
                        }
                     ],
                     "colocation_id" : 28
                  },
                  {
                     "channels" : [
                        {
                           "sugar_state" : 4,
                           "name" : "channel4-5",
                           "multicast" : "1.2.4.5",
                           "network_state" : 1
                        }
                     ],
                     "customer_id" : 4,
                     "colocation_id" : 29,
                     "colocation_name" : "colo4",
                     "device_name" : "device 4"
                  },
                  {
                     "device_name" : "device 5",
                     "colocation_name" : "colo5",
                     "colocation_id" : 30,
                     "channels" : [
                        {
                           "sugar_state" : 4,
                           "name" : "channel5-5",
                           "multicast" : "1.2.5.5",
                           "network_state" : 1
                        }
                     ],
                     "customer_id" : 5
                  }
               ],
               "generation_date" : "2017-02-27T11:45:37+0100"
            },
            "_type" : "market_data_audit_document",
            "sort" : [
               1488192337000
            ],
            "_index" : "mars",
            "_score" : null,
            "_id" : "AVp_LPeJdrvi0cWb8CrL"
         }
      ],
      "max_score" : null
   },
   "aggregations" : {
      "nested" : {
         "doc_count" : 10,
         "filter" : {
            "doc_count" : 2,
            "tophits_agg" : {
               "hits" : {
                  "max_score" : 1,
                  "total" : 2,
                  "hits" : [
                     {
                        "_nested" : {
                           "offset" : 0,
                           "field" : "customers"
                        },
                        "_score" : 1,
                        "_source" : {
                           "channels" : [
                              {
                                 "name" : "channel1-5",
                                 "sugar_state" : 4,
                                 "multicast" : "1.2.1.5",
                                 "network_state" : 1
                              }
                           ],
                           "customer_id" : 1,
                           "colocation_id" : 26,
                           "colocation_name" : "colo1",
                           "device_name" : "device 1"
                        }
                     },
                     {
                        "_source" : {
                           "colocation_id" : 26,
                           "customer_id" : 1,
                           "channels" : [
                              {
                                 "multicast" : "1.2.1.5",
                                 "network_state" : 1,
                                 "name" : "channel1-5",
                                 "sugar_state" : 4
                              }
                           ],
                           "device_name" : "device 1",
                           "colocation_name" : "colo1"
                        },
                        "_nested" : {
                           "offset" : 0,
                           "field" : "customers"
                        },
                        "_score" : 1
                     }
                  ]
               }
            }
         }
      }
   },
   "_shards" : {
      "total" : 13,
      "successful" : 1,
      "failures" : [
         {
            "reason" : {
               "index" : ".kibana",
               "index_uuid" : "bTkwoysSQ0y8Tt9yYFRStg",
               "type" : "query_shard_exception",
               "reason" : "No mapping found for [generation_date] in order to sort on"
            },
            "shard" : 0,
            "node" : "4ZUgOm4VRry6EtUK15UH3Q",
            "index" : ".kibana"
         },
         {
            "reason" : {
               "index_uuid" : "lN2mVF9bRjuDtiBF2qACfA",
               "index" : "archiv1_log",
               "type" : "query_shard_exception",
               "reason" : "No mapping found for [generation_date] in order to sort on"
            },
            "shard" : 0,
            "node" : "4ZUgOm4VRry6EtUK15UH3Q",
            "index" : "archiv1_log"
         },
         {
            "index" : "archiv1_session",
            "shard" : 0,
            "node" : "4ZUgOm4VRry6EtUK15UH3Q",
            "reason" : {
               "type" : "query_shard_exception",
               "index" : "archiv1_session",
               "index_uuid" : "cmMAW04YTtCb0khEqHpNyA",
               "reason" : "No mapping found for [generation_date] in order to sort on"
            }
         },
         {
            "shard" : 0,
            "node" : "4ZUgOm4VRry6EtUK15UH3Q",
            "reason" : {
               "reason" : "No mapping found for [generation_date] in order to sort on",
               "index" : "archiv1_users_dev",
               "index_uuid" : "AH48gIf5T0CXSQaE7uvVRg",
               "type" : "query_shard_exception"
            },
            "index" : "archiv1_users_dev"
         }
      ],
      "failed" : 12
   }
}

php

elasticsearch

elasticsearch-5

回答 2

Stack Overflow用户

回答已采纳

发布于 2017-02-27 11:39:11

根据你的描述：

在elasticsearch上使用一组属性存储文档
每个文档都包含数组中的客户列表(嵌套文档)。
您希望只提取与customer.id相关的嵌套文档。
库不管理没有桶的Elasticsearch响应。
您期望Elasticsearch返回嵌套文档。

问题

它存在两种聚集体：

水桶
度量标准

在您的示例中，在嵌套的Agg : Filter和度量下有两个集合。过滤器：

过滤器定义了所有文档的单个桶。，但不提供对结果的“桶”关键字。
顶部点击是公制的，不提供水桶。

解决办法：

我怀疑PHP是否能够正确地处理嵌套的聚合结果，但是您可以使用Filter 的而不是Filter来获得桶列表

{
  "aggregations": {
    "nested": {
      "nested": {
        "path": "customers"
      },
      "aggregations": {
        "filters_customer": {
          "filters": {
            "filters": [
              {
                "match": {
                  "customers.customer_id": "1"
                }
              }
            ]
          },
          "aggregations": {
            "top_hits_customer": {
              "top_hits": {}
            }
          }
        }
      }
    }
  }
}

将提供如下内容：

{
  "aggregations": {
    "nested": {
      "doc_count": 15,
      "filters_customer": {
        "buckets": [
          {
            "doc_count": 3,
            "top_hits_customer": {
              "hits": {
                "total": 3,
                "max_score": 1,
                "hits": [
                  {
                    "_nested": {
                      "field": "customers",
                      "offset": 0
                    },
                    "_score": 1,
                    "_source": {
                      "customer_id": 1,
                      "foo": "bar"
                    }
                  },
                  {
                    "_nested": {
                      "field": "customers",
                      "offset": 0
                    },
                    "_score": 1,
                    "_source": {
                      "customer_id": 1,
                      "foo": "bar"
                    }
                  },
                  {
                    "_nested": {
                      "field": "customers",
                      "offset": 0
                    },
                    "_score": 1,
                    "_source": {
                      "customer_id": 1,
                      "foo": "bar"
                    }
                  }
                ]
              }
            }
          }
        ]
      }
    }
  }
}

编辑2的注意事项

Elasticsearch将搜索所有文档，而不是根据您的报告日期在“顶级1”文档上搜索。将结果按报表拆分的一种方法是在报表日期上使用术语桶：

{
  "query": {},
  "size": 0,
  "aggregations": {
    "grp_report": {
      "terms": {
        "field": "generation_date"
      },
      "aggregations": {
        "nested_customers": {
          "nested": {
            "path": "customers"
          },
          "aggregations": {
            "filters_customer": {
              "filters": {
                "filters": [
                  {
                    "match": {
                      "customers.customer_id": "1"
                    }
                  }
                ]
              },
              "aggregations": {
                "top_hits_customer": {
                  "top_hits": {}
                }
              }
            }
          }
        }
      }
    }
  }
}

建议：

避免复杂的文档，更喜欢使用相关键(例如reportId)将报表拆分到小文档中。您将能够轻松地过滤和聚合而不需要任何嵌套文档。添加客户文档信息，您将过滤所有类型(在这种情况下，冗余不是问题)。

用例示例：

报告清单
按报表向客户显示信息
显示客户跨多个报表的历史记录

当前文档示例: /indexName/market_audit

{
  "generation_date": "...",
  "customers": [
    {
      "id": 1,
      "foo": "bar 1"
    },
    {
      "id": 2,
      "foo": "bar 2"
    },
    {
      "id": 3,
      "foo": "bar 3"
    }
  ]
}

修改后的文件：

/索引名称/市场审计报告

{
  "report_id" : "123456"
  "generation_date": "...",
  "foo":"bar"
}

/indexName/market_audit_customer文档

{
  "report_id" : "123456"
  "customer_id": 1,
  "foo": "bar 1"
}


{
  "report_id" : "123456"
  "customer_id": 2,
  "foo": "bar 2"
}


{
  "report_id" : "123456"
  "customer_id": 3,
  "foo": "bar 3"
}

如果您知道您的报告id，您将能够在一个请求中获取所有数据：

报表id上的过滤器
类型上的一个术语聚合
- 类型报表的筛选器
  - 获取报表的top_hit聚合

- a filter aggregation to get only type customer and customer id 1  
    - a top\_hit aggregation to customer 1 info

或

报表id上的过滤器
类型上的一个术语聚合
- 类型报表的筛选器
  - 获取报表的top_hit聚合

- a term aggregation on customer id  
    - a top\_hit aggregation to retrieve information per customer

顶部命中聚集大小

不要忘记在您的size中提供top_hits，否则只会得到前三名

票数 1

Stack Overflow用户

发布于 2017-02-27 12:57:23

阅读聚合定义的第一行elasticsearch，我认为您不太明白它是如何工作的：

聚合框架有助于提供基于搜索查询的聚合数据。

由于您的查询根本没有任何筛选器，所以返回 all -- hits.hits对象中存储的文档--是预期的结果。然后使用filter聚合来帮助您获取所需的文档，但它们位于返回的dict的aggs属性中。

如果我是对的，我会建议你尽量保持简单，下面是我猜到的查询

{
  "query": {
    "filtered": {
        "filter": {
            "nested": {
                "path" : "customers",
                "filter": {
                    "bool": {
                        "must" : [
                            "term": {"customer_id" : "1"}
                        ]
                    }
                }
            }
        }
    }
  },
  "aggregations": {
    "tophits_agg": {
      "top_hits": {}
    }
  }
}

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/42255998

复制

相似问题

问在Elasticsearch 5上使用过滤器获取嵌套文档
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在Elasticsearch 5上使用过滤器获取嵌套文档EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在Elasticsearch 5上使用过滤器获取嵌套文档
EN