CBO选择消除“慢速”查询中的distinct --它可以判断出由于外部group by而不需要该操作。
我的问题是:
( 1)为什么在这种情况下它选择这样做??我看不出计划中的成本计算和预测基数有什么原因。
2)如果它选择消除distinct,为什么不应用相同的逻辑并消除group by?
试验床:
create table t1 as
select rownum product_id, mod(rownum,3)+1 company_id
from dual
connect by rownum<=500;
create table t2 as
select t1.product_id from t1 t1 cross join t1 t12;
create table t3 as
select distinct company_id from t1;
analyze table t1 compute statistics;
analyze table t2 compute statistics;
analyze table t3 compute statistics;快速(55毫秒):
select company_id
from t1
join t2 using(product_id)
join ( select company_id
from (select company_id from t1 group by company_id)
join t3 using(company_id) ) using(company_id)
group by company_id;慢(5240毫秒):
select company_id
from t1
join t2 using(product_id)
join ( select company_id
from (select distinct company_id from t1)
join t3 using(company_id) ) using(company_id)
group by company_id;SQLFiddle在这里。
发布于 2015-05-19 16:02:58
输出来自OracleLinux7.1x86-64平台上的11.2.0.4.6企业版数据库。
让我们从问题2和一个简单的例子开始。
DISTINCT和GROUP BY的处理方式不同:优化器能够在某些情况下完全消除不同的情况,但是它不能对GROUP进行相同的处理。下面是一个示例:
create table t4 as
select rownum product_id
from dual
connect by rownum<=5;
exec dbms_stats.gather_table_stats(user, 'T4');
alter session set statistics_level=all;注意,将statistics_level设置为ALL会显著增加问题中原始查询的执行时间。
这两个查询将是:
select distinct product_id from t4
Plan hash value: 641655586
-------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | OMem | 1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 4 (100)| 5 |00:00:00.01 | | | |
| 1 | HASH UNIQUE | | 1 | 5 | 4 (25)| 5 |00:00:00.01 | 2441K| 2441K| 1503K (0)|
| 2 | TABLE ACCESS FULL| T4 | 1 | 5 | 3 (0)| 5 |00:00:00.01 | | | |
-------------------------------------------------------------------------------------------------------------------
select product_id from t4 group by product_id
Plan hash value: 581042373
-------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | OMem | 1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 4 (100)| 5 |00:00:00.01 | | | |
| 1 | HASH GROUP BY | | 1 | 5 | 4 (25)| 5 |00:00:00.01 | 2441K| 2441K| 863K (0)|
| 2 | TABLE ACCESS FULL| T4 | 1 | 5 | 3 (0)| 5 |00:00:00.01 | | | |
-------------------------------------------------------------------------------------------------------------------这正是我们所期望的。全表扫描和散列唯一的不同,散列组一组。现在添加一个NULL约束和一个索引。
create index t4_i1 on t4(product_id);
alter table t4 modify (product_id not null);
select distinct product_id from t4
Plan hash value: 4231414870
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 2 (100)| 5 |00:00:00.01 |
| 1 | SORT UNIQUE NOSORT| | 1 | 5 | 2 (50)| 5 |00:00:00.01 |
| 2 | INDEX FULL SCAN | T4_I1 | 1 | 5 | 1 (0)| 5 |00:00:00.01 |
-----------------------------------------------------------------------------------------
select product_id from t4 group by product_id
Plan hash value: 1989519822
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 (100)| 5 |00:00:00.01 |
| 1 | SORT GROUP BY NOSORT| | 1 | 5 | 1 (0)| 5 |00:00:00.01 |
| 2 | INDEX FULL SCAN | T4_I1 | 1 | 5 | 1 (0)| 5 |00:00:00.01 |
-------------------------------------------------------------------------------------------优化器注意到相关列上的索引,而且由于NULL约束,它能够使用它来避免为产生唯一值而对数据进行排序,因为数据已经在索引中排序。现在,向该列添加一个唯一约束:
alter table t4 add constraint t4_u1 unique (product_id) using index t4_i1;
select distinct product_id from t4
Plan hash value: 3974767428
---------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 (100)| 5 |00:00:00.01 |
| 1 | INDEX FULL SCAN | T4_I1 | 1 | 5 | 1 (0)| 5 |00:00:00.01 |
---------------------------------------------------------------------------------------
select product_id from t4 group by product_id
Plan hash value: 1989519822
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 (100)| 5 |00:00:00.01 |
| 1 | SORT GROUP BY NOSORT| | 1 | 5 | 1 (0)| 5 |00:00:00.01 |
| 2 | INDEX FULL SCAN | T4_I1 | 1 | 5 | 1 (0)| 5 |00:00:00.01 |
-------------------------------------------------------------------------------------------注意Oracle在使用DISTINCT时是如何完全跳过排序/散列唯一的,但是在按查询分组时没有发生任何变化。
在运行不同版本时启用跟踪优化器:
alter system flush shared_pool;
alter session set events '10053 trace name context forever, level 1';
select distinct product_id from t4;在跟踪文件中,我们可以看到以下内容:
**************************
Query transformations (QT)
**************************
...
Eliminated SELECT DISTINCT from query block SEL$1 (#0)
...
********************************
COST-BASED QUERY TRANSFORMATIONS
********************************
...
Final query after transformations:******* UNPARSED QUERY IS *******
SELECT "T4"."PRODUCT_ID" "PRODUCT_ID" FROM "BP"."T4" "T4"
...请注意,这是查询转换,但不是基于成本的查询转换。从“最终查询”部分可以看到,优化器从查询中删除了不同的内容。但是没有这样的优化。just用于检索不同的值,但GROUP BY用于生成聚合,而不仅仅是不同的值。即使优化器可以跳过数据的排序或散列,它也不能跳过计数、添加、计算平均值等,这是重要的区别,因此DISTINCT和GROUP BY的处理方式不同(即使未指定聚合)。
另一种情况是,在明显没有必要的情况下,删除不同的部分,例如:
alter session set "_complex_view_merging"=false;这是为了防止复杂视图合并转换,如不同的位置,组按位置,子查询取消嵌套出现,并使事情变得复杂。
select distinct product_id from (select distinct product_id from (select distinct product_id from (select distinct product_id from t4)));
Plan hash value: 3974767428
---------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 (100)| 5 |00:00:00.01 |
| 1 | INDEX FULL SCAN | T4_I1 | 1 | 5 | 1 (0)| 5 |00:00:00.01 |
---------------------------------------------------------------------------------------
select product_id from (select distinct product_id from (select product_id from (select distinct product_id from t4 ) group by product_id)) group by product_id;
Plan hash value: 4029011489
------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | OMem | 1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 (100)| 5 |00:00:00.01 | | | |
| 1 | HASH GROUP BY | | 1 | 5 | 1 (0)| 5 |00:00:00.01 | 2441K| 2441K| 862K (0)|
| 2 | VIEW | | 1 | 5 | 1 (0)| 5 |00:00:00.01 | | | |
| 3 | SORT GROUP BY NOSORT| | 1 | 5 | 1 (0)| 5 |00:00:00.01 | | | |
| 4 | INDEX FULL SCAN | T4_I1 | 1 | 5 | 1 (0)| 5 |00:00:00.01 | | | |
------------------------------------------------------------------------------------------------------------------------
select product_id from (select product_id from (select product_id from (select product_id from t4 group by product_id) group by product_id) group by product_id) group by product_id;
Plan hash value: 1970696362
------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | OMem | 1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 (100)| 5 |00:00:00.01 | | | |
| 1 | HASH GROUP BY | | 1 | 5 | 1 (0)| 5 |00:00:00.01 | 2441K| 2441K| 883K (0)|
| 2 | VIEW | | 1 | 5 | 1 (0)| 5 |00:00:00.01 | | | |
| 3 | HASH GROUP BY | | 1 | 5 | 1 (0)| 5 |00:00:00.01 | 2441K| 2441K| 848K (0)|
| 4 | VIEW | | 1 | 5 | 1 (0)| 5 |00:00:00.01 | | | |
| 5 | HASH GROUP BY | | 1 | 5 | 1 (0)| 5 |00:00:00.01 | 2441K| 2441K| 840K (0)|
| 6 | VIEW | | 1 | 5 | 1 (0)| 5 |00:00:00.01 | | | |
| 7 | HASH GROUP BY | | 1 | 5 | 1 (0)| 5 |00:00:00.01 | 2441K| 2441K| 848K (0)|
| 8 | INDEX FULL SCAN| T4_I1 | 1 | 5 | 1 (0)| 5 |00:00:00.01 | | | |
------------------------------------------------------------------------------------------------------------------------注意优化器是如何完全消除第一次和第二次查询中的区别的,而不是组BYs。
不幸的是,在这些情况下,由于子查询(“视图”)中发生了不同的消除,因此优化器跟踪中不存在此信息,就像问题中的原始查询一样。
因此,现在我们知道,不同的和分组的处理确实不同,让我们回到问题1。
在下一篇文章中继续.
(这两个答案加在一起超过了30000个字符的限制。)
发布于 2015-05-13 10:20:56
这并不是因为甲骨文对DISTINCT和GROUP BY的处理方式有任何不同。如果我们将第二个查询中的GROUP BY替换为SELECT中的DISTINCT,我们将获得与第一个查询相似的性能/计划成本。
我认为,如果我们在具有自然联接的内部和外部查询中使用相同的子句,那么Oracle解析器就能够认识到,它需要执行两次相同的操作,并且能够将操作合并为单个操作。
因此,如果我们查看第一个查询的计划,GROUP BY操作将在表扫描完成后进行一次,从而降低了成本和执行时间。
https://dba.stackexchange.com/questions/101364
复制相似问题