点击标题下「蓝色微信名」可快速关注
关系型数据库中的分区特性,是个双刃剑,用好了,对于检索提速、数据归档都会有很好的作用,但如果设置不当,就可能出现一些性能问题,毕竟用了分区表,大多都是海量数据的场景。
技术社群的这篇文章《SQL 优化 | Insert...Select 全局非分区索引慢的分析与解决》就给我们带来了关于OceanBase中分区索引相关的案例,虽然是OB,但其中的原理,各种数据库都可以借鉴。
某客户在 Oracle -> OceanBase(Oracle 模式)业务改造的 POC 测试中发现,某些大数据量批处理业务场景 INSERT ... SELECT 执行慢(OceanBase 侧表现为查询不出结果、报错超时)。
INSERT
/*+ enable_parallel_dml parallel(186) */
INTO ACA6 (
AAZ219,
...
AAA508
)
SELECT
/*+ USE_MERGE(A B) PARALLEL(186) monitor renzy888 */
'313' || LPAD(SEQ_KEY_ID.NEXTVAL, , ),
A.AAZ661,
...
'0'
FROM
ACA5 A,
AC61 B
WHERE
A.AAZ257 = B.AAZ257
ANDEXISTS (
SELECT
FROM
IMP_TREATMENT.AAA6 D
WHERE
D.AAA345 = B.AAA345
AND D.AAE140 = A.AAE140
AND D.BAA528 = '0'
AND D.AAD128 = A.AAE013.ZY
AND (
A.AAE013.WH ISNULL
OR (
A.AAE013.WH ISNOTNULL
AND A.AAE013.WH = SUBSTR(AAA345, , )
)
)
)
AND NVL(B.AAE019, ) <>
AND B.AAE041 <=
AND (
B.AAE042 ISNULL
OR B.AAE042 >=
)
AND A.AAZ654 = ('31300000006706838590')
ODC 上获取 SQL 的逻辑执行计划如下图:

通过获取 GV$OB_SQL_AUDIT 中 SQL 语句的 plan_id,查询 慢 SQL 的物理执行计划。
obclient [SYS]> select * from gv$ob_plan_cache_plan_explain where plan_id= ; +-----------+--------------+----------+---------+------------+--------------+----------------------------------------+------+-----------+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| TENANT_ID | SVR_IP | SVR_PORT | PLAN_ID | PLAN_DEPTH | PLAN_LINE_ID | OPERATOR | NAME | ROWS | COST | PROPERTY |
+-----------+--------------+----------+---------+------------+--------------+----------------------------------------+------+-----------+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1002 | 20.9.177.177 | 2882 | 2917135 | 0 | 0 | PHY_OPTIMIZER_STATS_GATHERING | NULL | 423 | 696598 | NULL |
| 1002 | 20.9.177.177 | 2882 | 2917135 | 1 | 1 | PHY_PX_FIFO_COORD | NULL | 423 | 696597 | NULL |
| 1002 | 20.9.177.177 | 2882 | 2917135 | 2 | 2 | PHY_PX_REDUCE_TRANSMIT | NULL | 423 | 694177 | NULL |
| 1002 | 20.9.177.177 | 2882 | 2917135 | 3 | 3 | PHY_PX_MULTI_PART_INSERT | NULL | 423 | 694148 | NULL |
| 1002 | 20.9.177.177 | 2882 | 2917135 | 4 | 4 | PHY_PX_FIFO_RECEIVE | NULL | 423 | 691468 | NULL |
| 1002 | 20.9.177.177 | 2882 | 2917135 | 5 | 5 | PHY_PX_DIST_TRANSMIT | NULL | 423 | 691455 | NULL |
| 1002 | 20.9.177.177 | 2882 | 2917135 | 6 | 6 | PHY_MATERIAL | NULL | 423 | 691426 | NULL |
| 1002 | 20.9.177.177 | 2882 | 2917135 | 7 | 7 | PHY_PX_MULTI_PART_INSERT | NULL | 423 | 691426 | NULL |
| 1002 | 20.9.177.177 | 2882 | 2917135 | 8 | 8 | PHY_OPTIMIZER_STATS_GATHERING | NULL | 423 | 688746 | NULL |
| 1002 | 20.9.177.177 | 2882 | 2917135 | 9 | 9 | PHY_PX_FIFO_RECEIVE | NULL | 423 | 688746 | NULL |
| 1002 | 20.9.177.177 | 2882 | 2917135 | 10 | 10 | PHY_PX_REPART_TRANSMIT | NULL | 423 | 688733 | NULL |
| 1002 | 20.9.177.177 | 2882 | 2917135 | 11 | 11 | PHY_SUBPLAN_SCAN | NULL | 423 | 688666 | NULL |
| 1002 | 20.9.177.177 | 2882 | 2917135 | 12 | 12 | PHY_SEQUENCE | NULL | 423 | 688666 | NULL |
| 1002 | 20.9.177.177 | 2882 | 2917135 | 13 | 13 | PHY_HASH_JOIN | NULL | 423 | 688665 | NULL |
| 1002 | 20.9.177.177 | 2882 | 2917135 | 14 | 14 | PHY_JOIN_FILTER | NULL | 2511 | 443 | NULL |
| 1002 | 20.9.177.177 | 2882 | 2917135 | 15 | 15 | PHY_PX_FIFO_RECEIVE | NULL | 2511 | 443 | NULL |
| 1002 | 20.9.177.177 | 2882 | 2917135 | 16 | 16 | PHY_PX_DIST_TRANSMIT | NULL | 2511 | 166 | NULL |
| 1002 | 20.9.177.177 | 2882 | 2917135 | 17 | 17 | PHY_SUBPLAN_SCAN | NULL | 2511 | 1 | NULL |
| 1002 | 20.9.177.177 | 2882 | 2917135 | 18 | 18 | PHY_GRANULE_ITERATOR | NULL | 2511 | 1 | NULL |
| 1002 | 20.9.177.177 | 2882 | 2917135 | 19 | 19 | PHY_TABLE_SCAN | D | 2511 | 1 | table_rows:5022, physical_range_rows:5022, logical_range_rows:5022, index_back_rows:0, output_rows:2511, avaiable_index_name[IDX_AAA6_AAA036,IDX_AAA6_AAA345,IDX_AAA6_AAE712,IDX_AAA6_AAE924,AAA6], pruned_index_name[IDX_AAA6_AAA036,IDX_AAA6_AAA345,IDX_AAA6_AAE712,IDX_AAA6_AAE924], estimation info[table_id:822261, (table_type:10, version:-1--1--1, logical_rc:5022, physical_rc:5022)] |
| 1002 | 20.9.177.177 | 2882 | 2917135 | 14 | 20 | PHY_GRANULE_ITERATOR | NULL | 70861418 | 479979 | NULL |
| 1002 | 20.9.177.177 | 2882 | 2917135 | 15 | 21 | PHY_MERGE_JOIN | NULL | 70861418 | 479979 | NULL |
| 1002 | 20.9.177.177 | 2882 | 2917135 | 16 | 22 | PHY_JOIN_FILTER | NULL | 1979583 | 6107 | NULL |
| 1002 | 20.9.177.177 | 2882 | 2917135 | 17 | 23 | PHY_TABLE_SCAN | A | 1979583 | 6107 | table_rows:5987520, physical_range_rows:6069176, logical_range_rows:6069176, index_back_rows:0, output_rows:5655876, avaiable_index_name[PK_ACA5_1,IDX_ACA5_AAZ654,IDX_ACA5_AAE140,ACA5], pruned_index_name[IDX_ACA5_AAE140], estimation info[table_id:1053846, (table_type:10, version:-1--1--1, logical_rc:47190, physical_rc:47190), (table_type:10, version:-1--1--1, logical_rc:93283, physical_rc:93283), (table_type:10, version:-1--1--1, logical_rc:94123, physical_rc:94123), (table_type:10, version:-1--1--1, logical_rc:92977, physical_rc:92977), (table_type:10, version:-1--1--1, logical_rc:47076, physical_rc:47076), (table_type:10, version:-1--1--1, logical_rc:46959, physical_rc:46959), (table_type:10, version:-1--1--1, logical_rc:46852, physical_rc:46852), (table_type:10, version:-1--1--1, logical_rc:93889, physical_rc:93889), (table_type:10, version:-1--1--1, logical_rc:93527, physical_rc:93527), (table_type:10, version:-1--1--1, logical_rc:93405, physical_rc:93405)] |
| 1002 | 20.9.177.177 | 2882 | 2917135 | 16 | 24 | PHY_TABLE_SCAN | B | 212095107 | 446868 | table_rows:229607267, physical_range_rows:232433145, logical_range_rows:232433145, index_back_rows:0, output_rows:212095106, avaiable_index_name[U_AC61_AAZ179_1,IDX_AC61_AAA350_2,IDX_AC61_AAC001_2,IDX_AC61_AAZ172_2,IDX_AC61_AAZ649_2,IDX_AC61_AAZ713_2,AC61], pruned_index_name[IDX_AC61_AAA350_2,IDX_AC61_AAC001_2,IDX_AC61_AAZ172_2,IDX_AC61_AAZ649_2,IDX_AC61_AAZ713_2], estimation info[table_id:1054126, (table_type:10, version:-1--1--1, logical_rc:1805188, physical_rc:1805188), (table_type:10, version:-1--1--1, logical_rc:3572809, physical_rc:3572809), (table_type:10, version:-1--1--1, logical_rc:3603544, physical_rc:3603544), (table_type:10, version:-1--1--1, logical_rc:3567340, physical_rc:3567340), (table_type:10, version:-1--1--1, logical_rc:1806504, physical_rc:1806504), (table_type:10, version:-1--1--1, logical_rc:1799898, physical_rc:1799898), (table_type:10, version:-1--1--1, logical_rc:1788445, physical_rc:1788445), (table_type:10, version:-1--1--1, logical_rc:3591925, physical_rc:3591925), (table_type:10, version:-1--1--1, logical_rc:3579490, physical_rc:3579490), (table_type:10, version:-1--1--1, logical_rc:3580307, physical_rc:3580307)] |
+-----------+--------------+----------+---------+------------+--------------+----------------------------------------+------+-----------+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
25 rows in set (0.162 sec)
物理执行计划与逻辑执行计划一致,先排除计划跑偏的问题(本实例中 AB 两表 MergeJoin 的性能理论是最佳的,因为两表间的谓词过滤性极差)。
实时监控 SQL 的执行状态,注意下文 sql_plan_monitor 的输出只是瞬时输出,实际诊断过程需要重复执行。
obclient [SYS]> SELECT op_id, op, output_rows, rescan, threads,
-> close_time - open_time AS open_dt,
-> last_row_eof_time - first_row_time AS row_dt,
-> open_time, close_time, first_row_time, last_row_eof_time
-> FROM (
-> SELECT plan_line_id AS op_id,
-> concat(lpad(' ', max(plan_depth), ' '), plan_operation) AS op,
-> sum(output_rows) AS output_rows,
-> sum(STARTS) AS rescan,
-> min(first_refresh_time) AS open_time,
-> max(last_refresh_time) AS close_time,
-> min(first_change_time) AS first_row_time,
-> max(last_change_time) AS last_row_eof_time,
-> count() AS threads
-> FROM gv$sql_plan_monitor
-> WHERE trace_id = 'YB420A08E3AB-000625B15CFFA43F-0-0'
-> GROUPBY plan_line_id, plan_operation, plan_depth
-> ORDERBY plan_line_id
-> ) a;
+-------+---------------------------------------+-------------+--------+---------+----------------------------+----------------------------+------------------------------+------------------------------+------------------------------+------------------------------+
| OP_ID | OP | OUTPUT_ROWS | RESCAN | THREADS | OPEN_DT | ROW_DT | OPEN_TIME | CLOSE_TIME | FIRST_ROW_TIME | LAST_ROW_EOF_TIME |
+-------+---------------------------------------+-------------+--------+---------+----------------------------+----------------------------+------------------------------+------------------------------+------------------------------+------------------------------+
| 0 | PHY_OPTIMIZER_STATS_GATHERING | 0 | 0 | 1 | NULL | NULL | 05-NOV-24 04.22.54.712937 PM | NULL | NULL | NULL |
| 1 | PHY_PX_FIFO_COORD | 0 | 0 | 1 | NULL | NULL | 05-NOV-24 04.22.54.712937 PM | NULL | NULL | NULL |
| 2 | PHY_PX_REDUCE_TRANSMIT | 0 | 0 | 186 | NULL | NULL | 05-NOV-24 04.31.51.616971 PM | NULL | NULL | NULL |
| 3 | PHY_PX_MULTI_PART_INSERT | 0 | 0 | 186 | NULL | NULL | 05-NOV-24 04.31.51.616971 PM | NULL | NULL | NULL |
| 4 | PHY_PX_FIFO_RECEIVE | 34061312 | 0 | 186 | NULL | NULL | 05-NOV-24 04.31.51.616971 PM | NULL | 05-NOV-24 04.31.52.015599 PM | NULL |
| 5 | PHY_PX_DIST_TRANSMIT | 37776128 | 0 | 186 | NULL | NULL | 05-NOV-24 04.22.54.763501 PM | NULL | 05-NOV-24 04.31.51.670017 PM | NULL |
| 6 | PHY_MATERIAL | 37776128 | 0 | 186 | NULL | NULL | 05-NOV-24 04.22.54.763501 PM | NULL | 05-NOV-24 04.31.51.670017 PM | NULL |
| 7 | PHY_PX_MULTI_PART_INSERT | 203346003 | 0 | 186 | NULL | +000000000 00:08:57.065607 | 05-NOV-24 04.22.54.763501 PM | NULL | 05-NOV-24 04.22.55.584554 PM | 05-NOV-24 04.31.52.650161 PM |
| 8 | PHY_OPTIMIZER_STATS_GATHERING | 203346003 | 0 | 186 | NULL | +000000000 00:08:57.726162 | 05-NOV-24 04.22.54.763501 PM | NULL | 05-NOV-24 04.22.54.918743 PM | 05-NOV-24 04.31.52.644905 PM |
| 9 | PHY_PX_FIFO_RECEIVE | 203346003 | 0 | 186 | NULL | +000000000 00:08:57.729324 | 05-NOV-24 04.22.54.763501 PM | NULL | 05-NOV-24 04.22.54.915581 PM | 05-NOV-24 04.31.52.644905 PM |
| 10 | PHY_PX_REPART_TRANSMIT | 22278160 | 0 | 9 | +000000000 00:08:26.689008 | +000000000 00:08:26.660537 | 05-NOV-24 04.22.54.720869 PM | 05-NOV-24 04.31.21.409877 PM | 05-NOV-24 04.22.54.747228 PM | 05-NOV-24 04.31.21.407765 PM |
| 11 | PHY_SUBPLAN_SCAN | 22278160 | 0 | 9 | +000000000 00:08:26.689008 | +000000000 00:08:26.645777 | 05-NOV-24 04.22.54.720869 PM | 05-NOV-24 04.31.21.409877 PM | 05-NOV-24 04.22.54.747228 PM | 05-NOV-24 04.31.21.393005 PM |
| 12 | PHY_SEQUENCE | 22278160 | 0 | 9 | +000000000 00:08:26.689008 | +000000000 00:08:26.645777 | 05-NOV-24 04.22.54.720869 PM | 05-NOV-24 04.31.21.409877 PM | 05-NOV-24 04.22.54.747228 PM | 05-NOV-24 04.31.21.393005 PM |
| 13 | PHY_HASH_JOIN | 22278160 | 0 | 9 | +000000000 00:08:26.689008 | +000000000 00:08:26.644722 | 05-NOV-24 04.22.54.720869 PM | 05-NOV-24 04.31.21.409877 PM | 05-NOV-24 04.22.54.747228 PM | 05-NOV-24 04.31.21.391950 PM |
| 14 | PHY_JOIN_FILTER | 5005 | 0 | 9 | +000000000 00:08:26.687952 | +000000000 00:00:00.001055 | 05-NOV-24 04.22.54.720869 PM | 05-NOV-24 04.31.21.408821 PM | 05-NOV-24 04.22.54.732465 PM | 05-NOV-24 04.22.54.733520 PM |
| 15 | PHY_PX_FIFO_RECEIVE | 5005 | 0 | 9 | +000000000 00:08:26.687952 | +000000000 00:00:00.001055 | 05-NOV-24 04.22.54.720869 PM | 05-NOV-24 04.31.21.408821 PM | 05-NOV-24 04.22.54.732465 PM | 05-NOV-24 04.22.54.733520 PM |
| 20 | PHY_GRANULE_ITERATOR | 22784040 | 0 | 9 | +000000000 00:08:26.687952 | +000000000 00:08:26.647884 | 05-NOV-24 04.22.54.720869 PM | 05-NOV-24 04.31.21.408821 PM | 05-NOV-24 04.22.54.743011 PM | 05-NOV-24 04.31.21.390895 PM |
| 21 | PHY_MERGE_JOIN | 22784040 | 0 | 9 | +000000000 00:08:26.687952 | +000000000 00:08:26.647884 | 05-NOV-24 04.22.54.720869 PM | 05-NOV-24 04.31.21.408821 PM | 05-NOV-24 04.22.54.743011 PM | 05-NOV-24 04.31.21.390895 PM |
| 22 | PHY_JOIN_FILTER | 611561 | 0 | 9 | +000000000 00:08:26.687952 | +000000000 00:08:26.640481 | 05-NOV-24 04.22.54.720869 PM | 05-NOV-24 04.31.21.408821 PM | 05-NOV-24 04.22.54.739847 PM | 05-NOV-24 04.31.21.380328 PM |
| 23 | PHY_TABLE_SCAN | 611561 | 0 | 9 | +000000000 00:08:26.687952 | +000000000 00:08:26.640481 | 05-NOV-24 04.22.54.720869 PM | 05-NOV-24 04.31.21.408821 PM | 05-NOV-24 04.22.54.739847 PM | 05-NOV-24 04.31.21.380328 PM |
| 24 | PHY_TABLE_SCAN | 22388172 | 0 | 9 | +000000000 00:08:26.687952 | +000000000 00:08:26.640481 | 05-NOV-24 04.22.54.720869 PM | 05-NOV-24 04.31.21.408821 PM | 05-NOV-24 04.22.54.739847 PM | 05-NOV-24 04.31.21.380328 PM |
+-------+---------------------------------------+-------------+--------+---------+----------------------------+----------------------------+------------------------------+------------------------------+------------------------------+------------------------------+
21 rows in set (0.327 sec)
发现耗时最差的算子在:【4-6】号算子,这 3 个算子耗时 20min+ 无法跑出。
简单解释下:
7 号算子 PHY_PX_MULTI_PART_INSERT 是第一次往 ACA6 表写入数据(即写主键),大概写入 2 亿左右数据(OUTPUT_ROWS 输出)。
6 号算子 -> 3 号算子需要物化下层算子(7 号写主键算子)输出的数据,且需要等待下层算子输出所有数据后才能够开始执行;可以通过 FIRST_ROW_TIME / LAST_ROW_EOF_TIME 字段判断:
04.22.55.584554 吐出第一行数据,04.31.52.650161 吐出最后一行数据。04.31.51.670017 开始接收下层(6 号)算子数据,并输出数据到上层算子,即再次往 ACA6 表写入数据,这里写的是全局唯一索引(结合逻辑执行计划来看,这里还涉及到 EXCHANGE IN/OUT 分布式算子的跨节点写入,存在大量 RPC 网络开销)。CREATE TABLE"IMP_xxxxxxxx"."ACA6" (
.....
"AAA508"VARCHAR2() NOTNULLENABLE,
"AAA350"NUMBER(),
CONSTRAINT"UDX_ACA6_AAZ219"UNIQUE ("AAZ219")
......
7号算子是写主键(即数据),而对于分区表来说,就相当于多个分区同时写入。
反之,这里 6->3 号算子需要物化数据、向上层算子写全局唯一(非分区)索引,相当于写单分区,效率相对慢(这里虽然开了 PARALLEL_DML 并发,但由于写单分区只能用到基于数据块的并发)。
为啥确定是写全局唯一索引?
EXPLAIN EXTENDED 逻辑执行计划中有输出 INDEX INSERT 算子。

INSERT ... SELECT ... 场景来说,当 OceanBase 目标表中存在全局索引(如下列表)时需要 INSERT 多次,执行计划中会有 INDEX INSERT 算子。INDEX INSERT 全局非分区索引的情况,大数据量跑批场景性能可能不是很客观,对此最佳优化措施就是改为全局分区索引。INSERT 目标表 ACA6 进行改造,将表中全局唯一非分区索引(UDX_ACA6_AAZ219)改造为全局唯一分区索引,充分利用到分区并发的优势,改造后该 INSERT ... SELECT ... 语句的效率有明显提升:CREATE UNIQUE INDEX "IMP_xxxxxxxx"."UDX_ACA6_AAZ219" ON "IMP_xxxxxxxx"."ACA6"("AAZ219") partition by hash(AAZ219) partitions ;