在执行Impala和PrestoDB之间的基准测试时,我们注意到在Imapala中构建枢轴表非常困难,因为它不像Presto那样有Cube操作符。以下是Presto中的两个示例:
多维数据集运算符为给定的列集生成所有可能的分组集(即幂集)。例如,查询:
SELECT origin_state, destination_state, sum(package_weight)
FROM shipping
GROUP BY CUBE (origin_state, destination_state);相当于:
SELECT origin_state, destination_state, sum(package_weight)
FROM shipping
GROUP BY GROUPING SETS (
(origin_state, destination_state),
(origin_state),
(destination_state),
());另一个例子是ROLLUP操作符。完整的文档在这里:https://prestodb.io/docs/current/sql/select.html。
它不是语法糖,因为PRESTO对整个查询执行一次表扫描,因此使用这个操作符,您可以在一个请求中构建枢轴表,Impala需要运行2-3个查询。
在Impala instaead为3的环境中,我们是否可以通过一个查询/表扫描来完成这个任务?否则,在创建任何类型的枢轴表时,性能都会变得糟糕。
发布于 2017-02-17 13:07:53
我们可以使用黑斑羚windo函数,但不是单列输出,您将得到3列。
SELECT origin_state,
destination_state,
SUM(package_weight) OVER (PARTITION BY origin_state, destination_state) AS pkgwgrbyorganddest,
SUM(package_weight) OVER (PARTITION BY origin_state) AS pkgwgrbyorg,
SUM(package_weight) OVER (PARTITION BY destination_state) AS pkgwgrbydest
FROM shipping;https://stackoverflow.com/questions/42170082
复制相似问题