我知道,通过使用合并,分布式节点不会组合来自碎片的中间结果。
下面的SQL
select sum(xxxxx),xxxxx from (
select sum(xxxx),xxxx
from (
select count(xxx),xxx
from distributed_table group by xxx )
group by xxxx SETTINGS distributed_group_by_no_merge = 1
) group by xxxxx我想知道sql的哪一部分将被发送到MergeTree节点,以便使用distributed_group_by_no_merge执行?是吗?从distributed_table组按xxx选择计数( xxx ))按xxxx设置组distributed_group_by_no_merge = 1
distributed_group_by_no_merge的参数如何改变分布式查询的行为?在MergeTree节点上执行哪些部分,在分布式节点上执行哪些部分?
发布于 2020-05-12 14:23:22
distributed_group_by_no_merge-param影响启动器节点(它是运行分布式查询的节点)如何形成分布式查询的最终结果:
我会将distributed_group_by_no_merge放在与定义的分布式表相同的子查询级别,以显式定义您的意图,并在有几个分布式子查询时避免混淆。
让我们看看如何检查这两种模式之间的差异(将使用数量-virtual列):
SELECT
groupUniqArray(_shard_num) AS shards,
..
FROM table
WHERE ..
GROUP BY ..
SETTINGS distributed_group_by_no_merge = 0
/* Aggregated states were merged into ONE result set on initiator-node.
┌─shards────┬─ ..
│ [2, 1, 3] │ ..
└───────────┴─ ..
*/SELECT
groupUniqArray(_shard_num) AS shards,
..
FROM table
WHERE ..
GROUP BY ..
SETTINGS distributed_group_by_no_merge = 1
/* Get a set of final results (not aggregated states) from each shard. They should be unioned manually.
┌─shards─┬─ ..
│ [2] │ ..
│ [1] │ ..
│ [3] │ ..
└────────┴─ ..
*/https://stackoverflow.com/questions/61743180
复制相似问题