首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >SQL -首次访问的唯一用户

SQL -首次访问的唯一用户
EN

Stack Overflow用户
提问于 2020-05-15 18:59:13
回答 2查看 183关注 0票数 1

给定下表visitorLog,编写一个SQL以按日期查找以下内容。

  1. Total_Visitors
  2. VisitorGain --与前一天相比
  3. VisitorLoss --与前一天相比
  4. Total_New_Visitors -首次访问的唯一用户
代码语言:javascript
复制
visitorLog : 
*----------------------*
|  Date        Visitor |
*----------------------*
| 01-Jan-2011     V1   |   
| 01-Jan-2011     V2   |           
| 01-Jan-2011     V3   |           
| 02-Jan-2011     V2   |              
| 03-Jan-2011     V2   |          
| 03-Jan-2011     V4   |         
| 03-Jan-2011     V5   |
*----------------------*           

Expected output:
*---------------------------------------------------------------------*
|  Date     Total_Visitors VisitorGain VisitorLoss Total_New_Visitors |
*---------------------------------------------------------------------*
| 01-Jan-2011      3              3            0            3         |      
| 02-Jan-2011      1              0            2            0         |       
| 03-Jan-2011      3              2            0            2         |
*---------------------------------------------------------------------*

这是我的SQL和SLQ小提琴

代码语言:javascript
复制
with cte as
(
    select
        date,
        total_visitors,
        lag(total_visitors) over (order by date) as prev_visitors,
        row_number() over (order by date ) as rnk
    from
    (
        select
            *,
            count(visitor) over (partition by date) as total_visitors
        from visitorLog
    ) val
    group by
        date,
        total_visitors
),
cte2 as
(
  select
    date,
    sum(case when rnk = 1 then 1 else 0 end) as total_new_visitors
  from
  (
    select 
        date,
        visitor,
        row_number() over (partition BY visitor order by date) as rnk
    from visitorLog
   ) t
  group by
    date

)

select
    c.date,
    sum(total_visitors) as total_visitors,
    sum(
        case
            when rnk = 1 then total_visitors
            when (rnk > 1 and prev_visitors < total_visitors) then (total_visitors - prev_visitors)
        else 
            0
        end
    )visitorGain,

    sum(
        case
            when rnk = 1 then 0
            when prev_visitors > total_visitors then (prev_visitors - total_visitors)
        else
            0
        end
    ) as visitorLoss,
    sum(total_new_visitors) as total_new_visitors  
from cte c
join cte2 c2
on c.date = c2.date
group by
    c.date
order by
    c.date

我的解决方案正像预期的那样起作用,但我想知道这里是否遗漏了任何可能破坏我的逻辑的edge cases。任何帮助都会很好。

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2020-05-15 21:00:39

这个逻辑实现了您想要的结果:

代码语言:javascript
复制
select date, count(*) as num_visitor,
       greatest(count(*) - lag(count(*)::int, 1, 0) over (order by date), 0) as visitor_gain,
       greatest(lag(count(*)::int, 1, 0) over (order by date) - count(*), 0) as visitor_loss,
       count(*) filter (where seqnum = 1) as num_new_visitors
from (select vl.*,
             row_number() over (partition by visitor order by date) as seqnum
      from visitorLog vl
     ) vl
group by date
order by date

这里是db<>fiddle。

票数 1
EN

Stack Overflow用户

发布于 2020-05-15 19:03:26

我将使用窗口函数和聚合:

代码语言:javascript
复制
select 
    date,
    count(*) no_visitor,
    count(*) - lag(count(*), 1, 0) over(partition by date) no_visitor_diff,
    count(*) filter(where rn = 1) no_new_visitors
from (  
    select t.*, row_number() over(partition by visitor order by date) rn
    from visitorLog
) t
group by date
order by date

子查询使用row_number()对每个客户的访问进行排序(每个客户的第一次访问获得行号1)。然后,外部查询由date聚合,并使用lag()获取“上一天”的访问者计数。

与前一天相比,我并不认为有两个不同的列来表示访问者的不同,因此这给出了一个单独的列,它的值要么是正数,要么是负值,这取决于客户是获得还是失去。

如果您真的想要两列,那么:

代码语言:javascript
复制
    greatest(count(*) - lag(count(*), 1, 0) over(partition by date), 0) visitor_gain,
    - least(count(*) - lag(count(*), 1, 0) over(partition by date), 0) visitor_loss
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/61826495

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档