我必须计算过去两年(2019和2020)逐日滑动的活跃用户总数。我有日期,电子邮件和上次访问列。如果last_visit > current day - 90,则活动用户为活动用户。我遇到的第一个问题是,我不知道如何告诉SQL我今天是什么日子。我尝试使用date列,但它带来了错误,并且行数与之前相同:
WITH users_list as
(SELECT SUBSTRING([agent_email], CHARINDEX('@', [agent_email])+1, LEN([agent_email])) AS DOMAIN,
SUBSTRING(last_visit, 1, +10) as _date,
VS.agent_email,
VS.last_visit,
vs.agent_license_type,
vs.custom_templic
FROM test.visitor AS VS
WHERE VS.id NOT LIKE '%@rule.com'
AND VS.agent_company NOT LIKE '%Rule%'
AND last_visit> _date - 90
GROUP BY VS.agent_email,
VS.last_visit,
vs.agent_license_type,
vs.custom_templic第二个问题,我不知道如何计算2019年和2020年的活跃用户总数。我尝试使用这个脚本:
SELECT _date,
count (agent_email)
over(order by _date ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) as number_of_active_users
FROM users_list但它也带来了错误。
发布于 2020-10-03 19:59:59
不幸的是,Redshift不支持日期的range窗口框架规范。
您的示例代码实现了问题描述中没有的逻辑。此答案针对问题描述。
因此,另一种方法是跟踪用户何时在范围内而不在范围内。这从确定每个用户的范围开始,这是一个缺口和孤岛问题:
select test_email, min(last_visit) as active_start,
max(last_visit) + interval '90 day' as active_end
from (select v.*,
sum(case when prev_last_visit > last_visit - interval '100 day' then 0 else 1 end) over
(partition by test_email order by last_visit) as active_period
from (select v.*,
lag(last_visit) over (partition by test_email order by last_visit) as prev_last_visit
from test.visitor v
) v
) v然后,有了这些信息,我们可以取消旋转,以跟踪某人何时进入活动状态并离开。所以:
with actives as (
select test_email, min(last_visit) as active_start,
max(last_visit) + interval '90 day' as active_end
from (select v.*,
sum(case when prev_last_visit > last_visit - interval '100 day' then 0 else 1 end) over
(partition by test_email order by last_visit) as active_period
from (select v.*,
lag(last_visit) over (partition by test_email order by last_visit) as prev_last_visit
from test.visitor v
) v
) v
)
select dte,
sum(inc) as change_on_day,
sum(sum(inc)) over (order by dte rows between unbounded preceding and current row) as actives_on_day
from ((select test_email, active_start as dte, 1 as inc
from actives
) union all
(select test_email, active_end as dte, -1 as inc
from actives
)
) a
group by dte;如果要对日期范围进行筛选,请将附加子查询或CTE与最外层查询中的筛选一起使用。
https://stackoverflow.com/questions/64183067
复制相似问题