我想分别计算不同类别的column1和column2的column3的均值和中位数。所以基本上我想计算按column1和column2分组的均值和中位数。
数据看起来像这样:
Table `xx.yy.zz`
column1 column2 column3
A A1 1
A A1 2
A A1 3
B B2 10
B B2 15
B B2 20
...期望的输出将是:
column1 column2 median3 mean3
A A1 2 2
A A2 median mean
A A3 median mean
B B1 median mean
B B2 15 15
C C1 median mean我尝试使用下面的代码。代码的第一部分(table1)正常工作,而第二部分(table2)的两个试验都不起作用。我做错了什么?用column1和column2计算column3组的中位数的正确方法是什么?
到目前为止我的代码如下:
WITH
table1 AS (SELECT column1, column2,
AVG(column3) AS mean3
FROM xx.yy.zz
GROUP BY 1,2
),
table2 AS (SELECT column1, column2,
PERCENTILE_CONT(column3, 0.5) OVER(PARTITION BY column1, column2,) AS median3
FROM xx.yy.zz
group by 1,2
),
**OR**
table2 AS (SELECT
PERCENTILE_CONT(column3, 0.5) OVER(PARTITION BY column1, column2,) AS median3
FROM xx.yy.zz
),
table3 AS (SELECT * FROM table1
INNER JOIN
(SELECT * FROM table2)
USING(column1, column2)
)
SELECT * FROM table3发布于 2020-11-05 22:35:51
下面是针对BigQuery标准SQL的说明
#standardsql
create temp function median (arr any type) as (
if(mod(array_length(arr), 2) = 0,
( arr[offset(div(array_length(arr), 2) - 1)] +
arr[offset(div(array_length(arr), 2))]) / 2,
arr[offset(div(array_length(arr), 2))] )
);
select column1, column2,
median(array_agg(column3 order by column3)) as median3,
avg(column3) as mean3
from `xx.yy.zz`
group by column1, column2 如果要应用于问题输出中的样本数据,则

发布于 2020-11-05 18:01:28
有两个选项,第二个是近似的,但更快(更具伸缩性):
table2 AS (
SELECT column1, column2, MAX(median_temp) as median3
FROM (
SELECT column1, column2, PERCENTILE_CONT(column3, 0.5) OVER (PARTITION BY column1, column2) AS median_temp
FROM xx.yy.zz
)
GROUP BY 1,2
),
table2 AS (
SELECT column1, column2, APPROX_QUANTILES(column3, 100)[OFFSET(50)] AS median3
FROM xx.yy.zz
GROUP BY 1,2
),https://stackoverflow.com/questions/64694577
复制相似问题