我有一张桌子,看起来像这样:
Date | User_ID
2017-1-1 | 1
2017-1-1 | 2
2017-1-1 | 4
2017-1-2 | 3
2017-1-2 | 2
... | ..
... | ..
... | ..
... | ..
2017-2-1 | 1
2017-2-2 | 2
... | ..
... | ..
... | ..我想计算每月活跃用户在滚动的30天期间。我知道Redshift不计算(不同的)窗口。如何才能获得以下输出?
Date | MAU
2017-1-1 | 3
2017-1-2 | 4 <- We don't want to count user_id 2 twice.
... | ..
... | ..
... | ..
2017-2-1 | ..
2017-2-2 | ..
... | ..
... | ..我试图这样做(显然失败了)。这是我的密码:
SELECT event_date
,sum(user_count) mau_count
,CASE
WHEN event_date = date_trunc('week', event_date)
THEN 1
ELSE 0
END week_starting FROM (
SELECT event_date
,count(*) OVER (PARTITION BY event_date ORDER BY event_date ROWS BETWEEN 30 PRECEDING
AND CURRENT ROW
) AS user_count <-- I know this is wrong. Just my attempt :)
FROM (
SELECT DISTINCT (user_id)
,event_date
FROM event_table
) daily_distinct_users
GROUP BY event_date
) cumulative_daily_distinct_users GROUP BY event_date;请告诉我怎样才能准确地计算毛数。谢谢!
发布于 2017-02-16 02:14:03
这个方法似乎有效( log表中的列名是dt和userid):
SELECT
end_date,
-- The number of distinct users during the 30 days prior
COUNT(DISTINCT userid) distinct_users
FROM log
JOIN
( -- A list of dates to appear in the output first column
SELECT DISTINCT dt AS end_date
FROM log
WHERE dt BETWEEN date '2017-01-01' AND date '2017-01-31'
) ON dt BETWEEN end_date - interval '30 days' AND end_date
GROUP BY end_date
ORDER BY end_date基本上,子select生成一个end_dates列表,该列表显示为第一个输出列.然后,它加入到在所选日期之前30天内出现的不同数量的userid。
发布于 2017-02-15 23:24:11
假设没有缺少日期,您可以首先获得用户在使用MIN函数时出现的第一个日期。然后获取每个日期的用户计数,然后使用SUM函数获得滚动和。
SELECT DISTINCT EVENT_DATE,
SUM(CNT) OVER(ORDER BY EVENT_DATE ROWS BETWEEN 30 PRECEDING AND CURRENT ROW) AS MAU
FROM
(SELECT E.EVENT_DATE,
COUNT(DISTINCT T.USER_ID) AS CNT
FROM EVENT_TABLE E
LEFT JOIN
(SELECT DISTINCT USER_ID,
MIN(EVENT_DATE) OVER(PARTITION BY USER_ID
ORDER BY EVENT_DATE ROWS BETWEEN 30 PRECEDING AND CURRENT ROW) AS FIRST_APPEARED_ON
FROM EVENT_TABLE
) T ON T.FIRST_APPEARED_ON=E.EVENT_DATE AND T.USER_ID=E.USER_ID
GROUP BY E.EVENT_DATE
) T1使用Server的示例演示
发布于 2017-12-03 21:47:37
@约翰·罗滕斯坦的回答效果很好。
对于那些偶然发现这个问题并正在寻找更多内容的人,下面的博客帖子描述了一种用于快速计算滚动MAU的替代预计算策略。这里的问题有点过分,但可能会派上用场,以防你:
https://stackoverflow.com/questions/42261489
复制相似问题