首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >BigQuery在分区上缺少和的行

BigQuery在分区上缺少和的行
EN

Stack Overflow用户
提问于 2020-11-06 16:47:30
回答 3查看 619关注 0票数 2

TL;DR:

鉴于这一表:

代码语言:javascript
复制
WITH subscriptions AS (SELECT TIMESTAMP("2020-11-01") as date, "premium" as product, 50 as diff
  UNION ALL SELECT TIMESTAMP("2020-11-01"), "basic", 100
  UNION ALL SELECT TIMESTAMP("2020-11-02"), "basic", -10
  UNION ALL SELECT TIMESTAMP("2020-11-03"), "premium", 20
  UNION ALL SELECT TIMESTAMP("2020-11-03"), "basic", 40
)

如何获得缺少的日期/产品组合(2020-11-02 - premium)包含在diff of 0的回退值中的表。

理想情况下,适用于多种产品。所有产品的列表如下所示:

代码语言:javascript
复制
SELECT ARRAY_AGG(DISTINCT product) FROM subscriptions

我希望能够得到订阅计数每天,无论是所有的产品或只是一些产品。

我认为实现这一目标的方法是准备一个如下所示的数据库:

代码语言:javascript
复制
|---------------------|------------------|------------------|
|         date        |      product     |       total      |
|---------------------|------------------|------------------|
|      2020-11-01     |      premium     |        100       |
|---------------------|------------------|------------------|
|      2020-11-01     |       basic      |        50        |
|---------------------|------------------|------------------|

有了这个表,我可以很容易地分组按日期和产品或仅仅按日期和总和。

在进入结果表之前,我已经生成了一个表,在该表中,我每天和产品计算订阅量的差异。每个产品有多少新订户,有多少不再订阅。

这张桌子看起来是这样的:

代码语言:javascript
复制
|---------------------|------------------|------------------|
|         date        |      product     |       diff       |
|---------------------|------------------|------------------|
|      2020-11-01     |      premium     |        50        |
|---------------------|------------------|------------------|
|      2020-11-01     |       basic      |       -20        |
|---------------------|------------------|------------------|

即11月1日,保费用户总数增加50人,基本用户总数减少20人。

现在的问题是,如果没有对某个产品进行任何更改,则这个临时表缺少日期点,请参见下面的示例。

当我开始的时候,没有产品表,我只有date和diff列。

为了从第二个表到第一个表,我使用了这个查询,它工作得很好:

代码语言:javascript
复制
WITH subscriptions AS (SELECT TIMESTAMP("2020-11-01") as date, 150 as diff
  UNION ALL SELECT TIMESTAMP("2020-11-02"), -10
  UNION ALL SELECT TIMESTAMP("2020-11-03"), 60
)
SELECT 
  *,
  SUM(diff) OVER (ORDER BY date) as total_subscriptions
FROM subscriptions
ORDER BY date

但是,当我添加产品列并试图计算每天和产品的总和时,就会缺少一些数据点。

代码语言:javascript
复制
WITH subscriptions AS (SELECT TIMESTAMP("2020-11-01") as date, "premium" as product, 50 as diff
  UNION ALL SELECT TIMESTAMP("2020-11-01"), "basic", 100
  UNION ALL SELECT TIMESTAMP("2020-11-02"), "basic", -10
  UNION ALL SELECT TIMESTAMP("2020-11-03"), "premium", 20
  UNION ALL SELECT TIMESTAMP("2020-11-03"), "basic", 40
)
SELECT 
  *,
  SUM(diff) OVER (PARTITION BY product ORDER BY date) as total_subscriptions
FROM subscriptions
ORDER BY date

--

代码语言:javascript
复制
|---------------------|------------------|------------------|
|         date        |      product     |      total       |
|---------------------|------------------|------------------|
|      2020-11-01     |       basic      |       100        |
|---------------------|------------------|------------------|
|      2020-11-01     |      premium     |        50        |
|---------------------|------------------|------------------|
|      2020-11-02     |       basic      |        90        |
|---------------------|------------------|------------------|
|      2020-11-03     |       basic      |       130        |
|---------------------|------------------|------------------|
|      2020-11-03     |      premium     |        70        |
|---------------------|------------------|------------------|

如果我现在显示每天订阅的总数,我将得到:

150 -> 90 -> 200

但我希望:

150 -> 140 -> 200

每天的保费订阅总数也是如此:

50 -> 0 -> 70

但我希望:

50 -> 50 -> 70

我认为解决这个问题的最佳选择是添加缺少的日期/产品组合。

我该怎么做?

EN

回答 3

Stack Overflow用户

发布于 2020-11-06 17:46:16

代码语言:javascript
复制
      -- Try this,I am creating a table for list of products and add total product in that list. Joining with your table to get data as per your requirement.
      WITH subscriptions AS (SELECT TIMESTAMP("2020-11-01") as date, "premium" as product, 50 as diff
        UNION ALL SELECT TIMESTAMP("2020-11-01"), "basic", 100
        UNION ALL SELECT TIMESTAMP("2020-11-02"), "basic", -10
        UNION ALL SELECT TIMESTAMP("2020-11-03"), "premium", 20
        UNION ALL SELECT TIMESTAMP("2020-11-03"), "basic", 40
      ),

      product_name as (
      Select product from subscriptions group by 1
      union all
      Select "Total" as product
      )

      Select date
            ,product
            ,total_subscriptions
      from (      
      Select a.date
            ,a.product
            ,diff
            ,SUM(diff) OVER (PARTITION BY a.product ORDER BY a.date) as total_subscriptions
      from 
      (
      Select date,a.product
      from product_name A
       join subscriptions B
       on 1=1
       where a.product !='Total'
      group by 1,2
      ) A
      left join subscriptions B 
      on A.product = B.product
      and A.date = B.date
      group by 1,2,3
      ) group by 1,2,3
      union all
      Select date
            ,product
            ,total_subscriptions
      from 
      (
      Select date,a.product
            ,diff
            ,SUM(diff) OVER (PARTITION BY a.product ORDER BY date) as total_subscriptions
      from product_name A
       join subscriptions B
       on 1=1
       where a.product ='Total'
      group by 1,2,3
      ) group by 1,2,3
      order by 1,2
票数 1
EN

Stack Overflow用户

发布于 2020-11-06 17:08:48

使用列阵

代码语言:javascript
复制
WITH subscriptions AS (SELECT TIMESTAMP("2020-11-01") as date, "premium" as product, 50 as diff
  UNION ALL SELECT TIMESTAMP("2020-11-01"), "basic", 100
  UNION ALL SELECT TIMESTAMP("2020-11-02"), "basic", -10
  UNION ALL SELECT TIMESTAMP("2020-11-03"), "premium", 20
  UNION ALL SELECT TIMESTAMP("2020-11-03"), "basic", 40
),
dates AS (
  SELECT * 
  FROM UNNEST(GENERATE_TIMESTAMP_ARRAY('2020-11-01 00:00:00', '2020-11-03 00:00:00', INTERVAL 1 DAY)) as date
),
products AS (
  SELECT DISTINCT product FROM subscriptions
)
SELECT dates.date, products.product, subscriptions.diff
FROM dates 
CROSS JOIN products
LEFT JOIN subscriptions 
ON subscriptions.date = dates.date AND subscriptions.product = products.product
票数 0
EN

Stack Overflow用户

发布于 2020-11-06 17:15:16

如果我正确地理解了您的意思,一种方法是生成一个固定的日期列表,并与产品列表一起使用cross join。这给了你所有可能的组合。然后,您可以带一个left join的订阅表,最后执行窗口和:

代码语言:javascript
复制
select d.dt, p.product, sum(s.diff) over(partition by p.product order by d.dt) total
from unnest(generate_timestamp_array(
    timestamp('2020-11-01'), 
    timestamp('2020-11-03'), 
    interval 1 day)
) dt
cross join (
    select 'basic' product 
    union all select 'premium'
) p
left join subscriptions on s.product = p.product and s.date = dt

通过动态生成日期范围和产品列表,我们可以使查询更加通用:

代码语言:javascript
复制
select d.dt, p.product, sum(s.diff) over(partition by p.product order by d.dt) total
from (select min(date) min_dt, max(date) max_dt from subscriptions) d0
cross join unnest(generate_timestamp_array(d0.min_dt, d0.max_dt, interval 1 day)) dt
cross join (select distinct product from subscriptions) p
left join subscriptions on s.product = p.product and s.date = dt
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/64718590

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档