文章/答案/技术大牛

发布

社区首页 >问答首页 >apache druid中的Sum(distinct度量)

问apache druid中的Sum(distinct度量)
EN

Stack Overflow用户

提问于 2020-06-30 07:12:03

回答 1查看 394关注 0票数 2

我们如何在德鲁伊中写sum(distinct col)？如果我尝试用德鲁伊来写，它会说不能建立计划，但用德鲁伊也是可能的。我试图转换为子查询方法，但我的内部查询返回了大量的项级数据，因此超时。

druid

回答 1

Stack Overflow用户

发布于 2020-08-14 22:25:41

distinct count或sum不是默认由德鲁伊支持的。

实际上，有几种方法可以给出类似的结果。

选项1. Theta草图扩展(推荐)

如果您启用Theta Sketch扩展(参见https://druid.apache.org/docs/latest/development/extensions-core/datasketches-theta.html)，您可以使用它来获得相同的结果。

示例：

{
    "queryType": "groupBy",
    "dataSource": "hits",
    "intervals": [
        "2020-08-14T11:00:00.000Z/2020-08-14T12:00:00.000Z"
    ],
    "dimensions": [],
    "granularity": "all",
    "aggregations": [
        {
            "type": "cardinality",
            "name": "col",
            "fields": [
                {
                    "type": "default",
                    "dimension": "domain",
                    "outputType": "string",
                    "outputName": "domain"
                }
            ],
            "byRow": false,
            "round": false
        }
    ]
}

结果：

+--------+
| domain | 
+--------+
| 22     | 
+--------+

选项2:基数

cardinality()聚合计算一组Apache Druid (孵化)维度的基数，使用HyperLogLog估计基数。

示例：

{
    "queryType": "groupBy",
    "dataSource": "hits",
    "intervals": [
        "2020-08-14T11:00:00.000Z/2020-08-14T12:00:00.000Z"
    ],
    "dimensions": [],
    "granularity": "all",
    "aggregations": [
        {
            "type": "cardinality",
            "name": "domain",
            "fields": [
                {
                    "type": "default",
                    "dimension": "domain",
                    "outputType": "string",
                    "outputName": "domain"
                }
            ],
            "byRow": false,
            "round": false
        }
    ]
}

响应：

+-----------------+
| domain          | 
+-----------------+
| 22.119017166376 | 
+-----------------+

选项3.使用hyperUnique

此选项要求您跟踪索引时的计数。如果你已经应用了这一点，你可以在你的查询中使用：

{
    "queryType": "groupBy",
    "dataSource": "hits",
    "intervals": [
        "2020-08-14T11:00:00.000Z/2020-08-14T12:00:00.000Z"
    ],
    "dimensions": [],
    "granularity": "all",
    "aggregations": [
        {
            "type": "hyperUnique",
            "name": "domain",
            "fieldName": "domain",
            "isInputHyperUnique": false,
            "round": false
        }
    ],
    "context": {
        "groupByStrategy": "v2"
    }
}

因为我的数据集中没有hyperUnique指标，所以我没有确切的示例响应。

这个页面很好地解释了这种方法：https://blog.mshimul.com/getting-unique-counts-from-druid-using-hyperloglog/

结论

在我看来，Theta Sketch扩展是获得结果的最好、最简单的方法。请仔细阅读文档。

如果你是一个PHP用户，你可以看看这篇文章，也许它会有帮助：

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/62647980

复制

相似问题

问apache druid中的Sum(distinct度量)
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问apache druid中的Sum(distinct度量)EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问apache druid中的Sum(distinct度量)
EN