文章/答案/技术大牛

发布

社区首页 >问答首页 >当使用CTE时，最好使用join或in()进行过滤？

问当使用CTE时，最好使用join或in()进行过滤？
EN

Stack Overflow用户

提问于 2020-01-19 23:11:48

回答 1查看 223关注 0票数 1

我正在处理一个需要很长时间才能运行的查询。由于这个查询无论如何都会通过cron作业自动执行，所以这并不重要，而且与仅嵌套查询相比，我喜欢CTE的可读性，在过去，我发现由于使用谓词，CTE的工作速度更快。

因此，我更喜欢在可能的情况下使用CTE而不使用子查询的可读性。

以下是运行相同查询及其explain语句的三种方法。我想更好地理解explain语句，以及了解哪种方法更好。

with 

/*
cte for filtering all events that happened on this date only
*/
filter_sessions as (
select dimension1
from ga_flagship_ecom.sessions
where date = '2020-01-16'
),


ee as (
select 
    dimension1,
    dimension3,
    case when sum(case when metric1 = 0 then 1 else 0 end) > 0 then 1 else 0 end as zero_val_product -- roll up to event level
from ga_flagship_ecom.ecom
where dimension1 in (select dimension1 from filter_sessions)
group by 1,2
)

select * from ee;

在这里，我使用in()来过滤ee。解释：

CTE Scan on ee  (cost=61757.45..61758.23 rows=39 width=44)
  CTE filter_sessions
    ->  Index Only Scan using sessions_date_idx on sessions  (cost=0.56..2.76 rows=1 width=22)
          Index Cond: (date = '2020-01-16'::date)
  CTE ee
    ->  GroupAggregate  (cost=61753.72..61754.69 rows=39 width=34)
          Group Key: ecom.dimension1, ecom.dimension3
          ->  Sort  (cost=61753.72..61753.81 rows=39 width=34)
                Sort Key: ecom.dimension1, ecom.dimension3
                ->  Nested Loop  (cost=0.58..61752.69 rows=39 width=34)
                      ->  HashAggregate  (cost=0.02..0.03 rows=1 width=32)
                            Group Key: (filter_sessions.dimension1)::text
                            ->  CTE Scan on filter_sessions  (cost=0.00..0.02 rows=1 width=32)
                      ->  Index Scan using ecom_pk on ecom  (cost=0.56..61752.26 rows=39 width=34)

另一种方法是使用内部联接进行筛选：

with 

/*
cte for filtering all events that happened on this date only
*/
filter_sessions as (
select dimension1
from ga_flagship_ecom.sessions
where date = '2020-01-16'
),


ee as (
select 
    e.dimension1,
    e.dimension3,
    case when sum(case when e.metric1 = 0 then 1 else 0 end) > 0 then 1 else 0 end as zero_val_product -- roll up to event level
from ga_flagship_ecom.ecom e
join filter_sessions f on f.dimension1 = e.dimension1
group by 1,2
)

select * from ee;

此版本的解释如下：

CTE Scan on ee  (cost=61757.43..61758.21 rows=39 width=44)
  CTE filter_sessions
    ->  Index Only Scan using sessions_date_idx on sessions  (cost=0.56..2.76 rows=1 width=22)
          Index Cond: (date = '2020-01-16'::date)
  CTE ee
    ->  GroupAggregate  (cost=61753.70..61754.68 rows=39 width=34)
          Group Key: e.dimension1, e.dimension3
          ->  Sort  (cost=61753.70..61753.80 rows=39 width=34)
                Sort Key: e.dimension1, e.dimension3
                ->  Nested Loop  (cost=0.56..61752.67 rows=39 width=34)
                      ->  CTE Scan on filter_sessions f  (cost=0.00..0.02 rows=1 width=32)
                      ->  Index Scan using ecom_pk on ecom e  (cost=0.56..61752.26 rows=39 width=34)
                            Index Cond: ((dimension1)::text = (f.dimension1)::text)

然后还有第三个选项，可以同时使用where和join：

with 

/*
cte for filtering all events that happened on this date only
*/
filter_sessions as (
select dimension1
from ga_flagship_ecom.sessions
where date = '2020-01-16'
),


ee as (
select 
    e.dimension1,
    e.dimension3,
    case when sum(case when e.metric1 = 0 then 1 else 0 end) > 0 then 1 else 0 end as zero_val_product -- roll up to event level
from ga_flagship_ecom.ecom e
join filter_sessions f on f.dimension1 = e.dimension1
where e.dimension1 in (select dimension1 from filter_sessions)
group by 1,2
)

select * from ee;

这个版本的解释是：

CTE Scan on ee  (cost=61758.32..61759.10 rows=39 width=44)
  CTE filter_sessions
    ->  Index Only Scan using sessions_date_idx on sessions  (cost=0.56..2.76 rows=1 width=22)
          Index Cond: (date = '2020-01-16'::date)
  CTE ee
    ->  GroupAggregate  (cost=61754.59..61755.57 rows=39 width=34)
          Group Key: e.dimension1, e.dimension3
          ->  Sort  (cost=61754.59..61754.69 rows=39 width=34)
                Sort Key: e.dimension1, e.dimension3
                ->  Nested Loop Semi Join  (cost=0.56..61753.56 rows=39 width=34)
                      Join Filter: ((f.dimension1)::text = (filter_sessions.dimension1)::text)
                      ->  Nested Loop  (cost=0.56..61752.67 rows=39 width=66)
                            ->  CTE Scan on filter_sessions f  (cost=0.00..0.02 rows=1 width=32)
                            ->  Index Scan using ecom_pk on ecom e  (cost=0.56..61752.26 rows=39 width=34)

我不太熟悉阅读explain语句，但3个语句顶部的“成本”非常相似: 61757.45、61757.43和61758.32。

哪种方法是最好的？无论是此时此地，还是更一般的地方？使用内部联接进行筛选或使用in()进行筛选

select version();
PostgreSQL 11.6, compiled by Visual C++ build 1800, 64-bit

postgresql

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-01-19 23:20:37

EXISTS子句的执行速度始终快于IN子句。你一定要试一下-

select dimension1,
       dimension3,
       case when sum(case when metric1 = 0
                               then 1
                          else 0
                     end) > 0
                 then 1
            else 0
       end as zero_val_product
from ga_flagship_ecom.ecom E
where EXISTS (select 1
              from ga_flagship_ecom.sessions S
              WHERE S.dimension1 = E.dimension1
              AND date = '2020-01-16')
group by 1,2

除此之外，为了更快地执行，你可以尝试在会话上建立索引(date，dimension1)

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/59811396

复制

相似问题

问当使用CTE时，最好使用join或in()进行过滤？
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问当使用CTE时，最好使用join或in()进行过滤？EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问当使用CTE时，最好使用join或in()进行过滤？
EN