因此,我有一个表,它包括:表、shop_id和事务量。
date_key shop_id transaction_amount
2022-01-01 S_001 2000
2022-01-01 S_002 2500
2022-01-02 S_001 2600
2022-01-02 S_002 2200
.
.
.
.目标是我想要计算前1000名卖家MTD交易量的百分比。平均而言,我有大约5万名卖家,每天都有交易。由于计算是月到目前为止,所以我们有可能会找到不同的前1000名卖家在每一天。
前1000位卖家mtd的意思:性能是基于累积MTD交易金额,例如2022-01-05的数据,我们需要计算累积交易额从2022-01-01到2022-01-04的每一个shop_id,然后排序它根据最高的交易量。
目标是创建下面的表格
date_key amount_from_top_1000_sellers(a) amount_from_all_sellers (b) ratio (a/b)
2022-01-01 4000 200000 2%
...
...
...发布于 2022-09-04 17:40:02
窗口函数将执行此任务。对于每一天,您都需要一个由“销售金额”栏订购的row_number。
首先,在tbl表中生成一些随机数据。然后,我们将每天的行号相加。接下来,我们需要过滤前100个畅销书每天(if声明)。每天增加到1000名顶级卖家是很明显的。剩余销售量和定额的计算可以通过将最后一个SELECT包含在WITH中作为进一步的表并再次使用SELECT查询该表来完成。
WITH
tbl AS (
SELECT
DATE_SUB(CURRENT_DATE(),INTERVAL d day) AS date_key,
a AS shop_id,
100*RAND() AS transaction_amount
FROM
UNNEST(GENERATE_ARRAY(1,1000)) a,
UNNEST(GENERATE_ARRAY(0,100)) d ),
trunc_to_month AS (
Select
date_trunc(date_key,month) as date_key_month,#aggregate to month and year
shop_id,
sum(transaction_amount) as transaction_amount,
from tbl
group by 1,2
),
tmp AS (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY date_key_month ORDER BY transaction_amount DESC ) AS rownum
FROM
trunc_to_month )
SELECT
date_key_month, # keep only the month and year; remove the day
SUM(transaction_amount) AS transaction_amount_total,
SUM(IF(rownum<=100,transaction_amount,0)) AS amount_from_top_100_sellers
FROM
tmp
GROUP BY 1https://stackoverflow.com/questions/73600156
复制相似问题