如何在红移数据库中计算Truncated Mean?我希望它在非常大的数据集上运行
发布于 2015-09-05 04:32:04
Redshift包括常用的SQL统计函数,包括您需要的NTILE。
SELECT AVG(CASE WHEN quartile IN (2,3) THEN my_metric ELSE NULL END) central_mean
,AVG(my_metric) mean
FROM (SELECT my_metric, NTILE(4) OVER (ORDER BY cpu_usage) quartile
FROM (SELECT * FROM my_table LIMIT 1000 ) t ) t
;发布于 2017-04-06 15:49:52
您可以获取要从集合中剔除的百分位数的阈值。然后过滤掉这些阈值边界之外的度量值,最后您可以计算平均值。
SELECT avg(your_metric)
FROM (
SELECT
your_metric,
PERCENTILE_DISC(0.1) -- 10% lower boundary
WITHIN GROUP (ORDER BY your_metric) OVER () AS lower_threshold,
PERCENTILE_DISC(0.9) -- 90% higher boundary
WITHIN GROUP (ORDER BY your_metric) OVER () AS higher_threshold
FROM your_table
) t1 WHERE your_metric > lower_threshold AND your_metric < higher_thresholdhttps://stackoverflow.com/questions/32377181
复制相似问题