我编写了一个重复率查询,该查询为我提供了如下格式的队列重复速率数据:
cohort_join_day | repeat_day | repeat_users
11/15/16 | 0 | 10000
11/15/16 | 1 | 6000
11/15/16 | 2 | 3000repeat_day 0表示当天的总队列大小。
我试图跳过excel步骤,并添加第四列,包括每日重复率百分比如下:
cohort_join_day | repeat_day | repeat_users | repeat_percentage
11/15/16 | 0 | 10000 | 100%
11/15/16 | 1 | 6000 | 60%
11/15/16 | 2 | 3000 | 30%这一行的计算应该非常简单,例如:第6天的第1天队列重复率=(第6天的第1天队列重复率)/(第0天的第1天队列重复率)
(第1天队列重复频率在第0天)表示队列的总大小。
做这件事最好的方法是什么?
以下是我所写的每日重复率查询:
SELECT
to_char(cohort_join_day, 'YYYY-MM-DD') AS cohort_join_day,
EXTRACT(DAY FROM (current_day - cohort_join_day)) AS repeat_day,
COUNT(DISTINCT unique_id) AS repeat_users
FROM
(
SELECT
auu.unique_id,
date_trunc('day', auu.ds) AS current_day,
date_trunc('day', fsb.ds) AS cohort_join_day
FROM rust.a_unique_users AS auu
JOIN mobile.first_seen_byos AS fsb
ON fsb.unique_id = auu.unique_id
WHERE
auu.os_type = 'iphone_native_app'
AND fsb.ds >= '2016-11-01'
) AS uniques_by_day
WHERE
cohort_join_day <= current_day
GROUP BY
cohort_join_day,
repeat_day;发布于 2016-11-30 23:58:16
SELECT
*
,(repeat_users * 100.0) /
MAX(CASE WHEN repeat_day = 0 THEN repeat_users END) OVER () as repeat_percentage
FROM
Table条件聚合和窗口函数使这更容易。
如果您每天都试图进行这种计算,那么可以使用cohor_join_day来划分窗口函数:
SELECT
*
,(repeat_users * 100.0) /
MAX(CASE WHEN repeat_day = 0 THEN repeat_users END) OVER (PARTITION BY cohort_join_day) as repeat_percentage
FROM
TableMAX(column) OVER ()只需在整个数据集中提供column中的MAX值。
MAX(column) OVER (PARTITION BY column2)将在该列中为匹配的column2值提供MAX值。您可以认为PARTITION BY类似于GROUP BY。
用case表达式替换column允许您进行条件聚合。因此,例如,当您只想要repeat_users when repeat_day = 0时,一个case表达式,它意味着它将只返回每个分区一个值,而忽略其他值,因为它们将为null。
因此,如果您想在没有窗口函数的直接查询中执行相同的操作,您将执行如下操作:
SELECT
t.*
,(t.repeat_users * 100.0) / (SELECT t2.repeat_users
FROM
Table t2
WHERE
t.cohort_join_day = t2.cohort_join_day
AND t2.repeat_day = 0) as repeat_percentage
FROM
Table t为了向您展示如何在涉及多天的情况下使用Juan Carlo的方法,您可以这样做:
WITH cte AS (
SELECT
cohort_join_day
,repeat_users
FROM
@Table
WHERE
repeat_day = 0
)
SELECT
t.*
,(t.repeat_users * 100.0) / c.repeat_users as repeat_percentage
FROM
Table t
CROSS JOIN cte c
WHERE
t.cohort_join_day = c.cohort_join_day如果您想要运行的总数,请尝试如下
SUM(column) OVER (PARTITION BY column2 ORDER BY column3)当然,要熟悉窗口功能,他们现在是救生员。
发布于 2016-11-30 23:29:11
SQL演示
WITH boo AS (
SELECT *
FROM foo -- here go your query
), base as (
SELECT "repeat_users"
FROM boo
WHERE "repeat_day" = 0
)
SELECT boo.cohort_join_day,
boo.repeat_day,
boo.repeat_users,
100* ((boo.repeat_users * 1.0) / base.repeat_users) as repeat_percentage
FROM boo
CROSS JOIN base输出

https://stackoverflow.com/questions/40899995
复制相似问题