在将应用程序及其数据库从经典的PostgreSQL数据库迁移到Aurora数据库(两者都使用9.6版本)之后,我们发现,在Aurora上运行的特定查询比在PostgreSQL上运行慢得多--大约慢了10倍。
这两个数据库具有相同的配置,无论是用于硬件还是pg_conf。
查询本身相当简单。它是从我们用Java编写的后端生成的,并使用jOOQ编写查询:
with "all_acp_ids"("acp_id") as (
select acp_id from temp_table_de3398bacb6c4e8ca8b37be227eac089
)
select distinct "public"."f1_folio_milestones"."acp_id",
coalesce("public"."sa_milestone_overrides"."team",
"public"."f1_folio_milestones"."team_responsible")
from "public"."f1_folio_milestones"
left outer join
"public"."sa_milestone_overrides" on (
"public"."f1_folio_milestones"."milestone" = "public"."sa_milestone_overrides"."milestone"
and "public"."f1_folio_milestones"."view" = "public"."sa_milestone_overrides"."view"
and "public"."f1_folio_milestones"."acp_id" = "public"."sa_milestone_overrides"."acp_id"
)
where "public"."f1_folio_milestones"."acp_id" in (
select "all_acp_ids"."acp_id" from "all_acp_ids"
)temp_table_de3398bacb6c4e8ca8b37be227eac089是一个单列表,f1_folio_milestones (1700万个条目)和sa_milestone_overrides (大约100万个条目)是类似设计的表,在用于LEFT OUTER JOIN的所有列上都有索引。
temp_table_de3398bacb6c4e8ca8b37be227eac089可以包含多达5000个条目,它们都是不同的。
当我们在普通的PostgreSQL数据库上运行它时,它会生成以下查询计划:
Unique (cost=4802622.20..4868822.51 rows=8826708 width=43) (actual time=483.928..483.930 rows=1 loops=1)
CTE all_acp_ids
-> Seq Scan on temp_table_de3398bacb6c4e8ca8b37be227eac089 (cost=0.00..23.60 rows=1360 width=32) (actual time=0.004..0.005 rows=1 loops=1)
-> Sort (cost=4802598.60..4824665.37 rows=8826708 width=43) (actual time=483.927..483.927 rows=4 loops=1)
Sort Key: f1_folio_milestones.acp_id, (COALESCE(sa_milestone_overrides.team, f1_folio_milestones.team_responsible))
Sort Method: quicksort Memory: 25kB
-> Hash Left Join (cost=46051.06..3590338.34 rows=8826708 width=43) (actual time=483.905..483.917 rows=4 loops=1)
Hash Cond: ((f1_folio_milestones.milestone = sa_milestone_overrides.milestone) AND (f1_folio_milestones.view = (sa_milestone_overrides.view)::text) AND (f1_folio_milestones.acp_id = (sa_milestone_overrides.acp_id)::text))
-> Nested Loop (cost=31.16..2572.60 rows=8826708 width=37) (actual time=0.029..0.038 rows=4 loops=1)
-> HashAggregate (cost=30.60..32.60 rows=200 width=32) (actual time=0.009..0.010 rows=1 loops=1)
Group Key: all_acp_ids.acp_id
-> CTE Scan on all_acp_ids (cost=0.00..27.20 rows=1360 width=32) (actual time=0.006..0.007 rows=1 loops=1)
-> Index Scan using f1_folio_milestones_acp_id_idx on f1_folio_milestones (cost=0.56..12.65 rows=5 width=37) (actual time=0.018..0.025 rows=4 loops=1)
Index Cond: (acp_id = all_acp_ids.acp_id)
-> Hash (cost=28726.78..28726.78 rows=988178 width=34) (actual time=480.423..480.423 rows=987355 loops=1)
Buckets: 1048576 Batches: 1 Memory Usage: 72580kB
-> Seq Scan on sa_milestone_overrides (cost=0.00..28726.78 rows=988178 width=34) (actual time=0.004..189.641 rows=987355 loops=1)
Planning time: 3.561 ms
Execution time: 489.223 ms正如人们所看到的,它进行得相当顺利--查询不到一秒钟。但在Aurora的例子中,这种情况会发生:
Unique (cost=2632927.29..2699194.83 rows=8835672 width=43) (actual time=4577.348..4577.350 rows=1 loops=1)
CTE all_acp_ids
-> Seq Scan on temp_table_de3398bacb6c4e8ca8b37be227eac089 (cost=0.00..23.60 rows=1360 width=32) (actual time=0.001..0.001 rows=1 loops=1)
-> Sort (cost=2632903.69..2654992.87 rows=8835672 width=43) (actual time=4577.348..4577.348 rows=4 loops=1)
Sort Key: f1_folio_milestones.acp_id, (COALESCE(sa_milestone_overrides.team, f1_folio_milestones.team_responsible))
Sort Method: quicksort Memory: 25kB
-> Merge Left Join (cost=1321097.58..1419347.08 rows=8835672 width=43) (actual time=4488.369..4577.330 rows=4 loops=1)
Merge Cond: ((f1_folio_milestones.view = (sa_milestone_overrides.view)::text) AND (f1_folio_milestones.milestone = sa_milestone_overrides.milestone) AND (f1_folio_milestones.acp_id = (sa_milestone_overrides.acp_id)::text))
-> Sort (cost=1194151.06..1216240.24 rows=8835672 width=37) (actual time=0.039..0.040 rows=4 loops=1)
Sort Key: f1_folio_milestones.view, f1_folio_milestones.milestone, f1_folio_milestones.acp_id
Sort Method: quicksort Memory: 25kB
-> Nested Loop (cost=31.16..2166.95 rows=8835672 width=37) (actual time=0.022..0.028 rows=4 loops=1)
-> HashAggregate (cost=30.60..32.60 rows=200 width=32) (actual time=0.006..0.006 rows=1 loops=1)
Group Key: all_acp_ids.acp_id
-> CTE Scan on all_acp_ids (cost=0.00..27.20 rows=1360 width=32) (actual time=0.003..0.004 rows=1 loops=1)
-> Index Scan using f1_folio_milestones_acp_id_idx on f1_folio_milestones (cost=0.56..10.63 rows=4 width=37) (actual time=0.011..0.015 rows=4 loops=1)
Index Cond: (acp_id = all_acp_ids.acp_id)
-> Sort (cost=126946.52..129413.75 rows=986892 width=34) (actual time=4462.727..4526.822 rows=448136 loops=1)
Sort Key: sa_milestone_overrides.view, sa_milestone_overrides.milestone, sa_milestone_overrides.acp_id
Sort Method: quicksort Memory: 106092kB
-> Seq Scan on sa_milestone_overrides (cost=0.00..28688.92 rows=986892 width=34) (actual time=0.003..164.348 rows=986867 loops=1)
Planning time: 1.394 ms
Execution time: 4583.295 ms它实际上有一个较低的全球成本,但花费的时间几乎是以前的10倍!
禁用合并联接使Aurora恢复为散列连接,这会给出预期的执行时间--但永久禁用它不是一个选项。然而,奇怪的是,禁用嵌套循环在仍然使用合并联接时提供了更好的结果..。
Unique (cost=3610230.74..3676431.05 rows=8826708 width=43) (actual time=2.465..2.466 rows=1 loops=1)
CTE all_acp_ids
-> Seq Scan on temp_table_de3398bacb6c4e8ca8b37be227eac089 (cost=0.00..23.60 rows=1360 width=32) (actual time=0.004..0.004 rows=1 loops=1)
-> Sort (cost=3610207.14..3632273.91 rows=8826708 width=43) (actual time=2.464..2.464 rows=4 loops=1)
Sort Key: f1_folio_milestones.acp_id, (COALESCE(sa_milestone_overrides.team, f1_folio_milestones.team_responsible))
Sort Method: quicksort Memory: 25kB
-> Merge Left Join (cost=59.48..2397946.87 rows=8826708 width=43) (actual time=2.450..2.455 rows=4 loops=1)
Merge Cond: (f1_folio_milestones.acp_id = (sa_milestone_overrides.acp_id)::text)
Join Filter: ((f1_folio_milestones.milestone = sa_milestone_overrides.milestone) AND (f1_folio_milestones.view = (sa_milestone_overrides.view)::text))
-> Merge Join (cost=40.81..2267461.88 rows=8826708 width=37) (actual time=2.312..2.317 rows=4 loops=1)
Merge Cond: (f1_folio_milestones.acp_id = all_acp_ids.acp_id)
-> Index Scan using f1_folio_milestones_acp_id_idx on f1_folio_milestones (cost=0.56..2223273.29 rows=17653416 width=37) (actual time=0.020..2.020 rows=1952 loops=1)
-> Sort (cost=40.24..40.74 rows=200 width=32) (actual time=0.011..0.012 rows=1 loops=1)
Sort Key: all_acp_ids.acp_id
Sort Method: quicksort Memory: 25kB
-> HashAggregate (cost=30.60..32.60 rows=200 width=32) (actual time=0.008..0.008 rows=1 loops=1)
Group Key: all_acp_ids.acp_id
-> CTE Scan on all_acp_ids (cost=0.00..27.20 rows=1360 width=32) (actual time=0.005..0.005 rows=1 loops=1)
-> Materialize (cost=0.42..62167.38 rows=987968 width=34) (actual time=0.021..0.101 rows=199 loops=1)
-> Index Scan using sa_milestone_overrides_acp_id_index on sa_milestone_overrides (cost=0.42..59697.46 rows=987968 width=34) (actual time=0.019..0.078 rows=199 loops=1)
Planning time: 5.500 ms
Execution time: 2.516 ms我们已经询问了AWS支持小组,他们仍然在关注这个问题,但是我们想知道是什么导致了这个问题的发生。怎样才能解释这种行为差异呢?
在查看数据库的一些文档时,我看到Aurora随着时间的推移倾向于成本--因此它使用了成本最低的查询计划。
但正如我们所看到的,考虑到它的响应时间,它远不是最优的.是否有一个阈值或设置可以使数据库使用更昂贵但更快的查询计划?
发布于 2020-02-24 13:50:10
一个突出的问题(在所有查询计划中)都很容易解决:
Seq扫描对temp_table_de3398bacb6c4e8ca8b37be227eac089 (cost=0.00..23.60 rows=1360 width=32) (实际time=0.004..0.005 rows=1 loops=1)
大胆强调我的。也就是说,Postgres期望该表中有1360行,但只找到1行。
你评论说:
这是一张正常的桌子,在做完所有事情后,它就会被丢弃。...查询计划是用表中的单个值完成的,但是它总共可以有多达5000个条目,所有条目都是不同的。
。。这可以解释完全误导的预期行数。自真空不断更新统计数据。但它需要一些时间来发挥作用。如果在填充(或在很大程度上修改)这样的表之后立即运行复杂的查询,那么最好在中间手动运行至少ANALYZE:
ANALYZE temp_table_de3398bacb6c4e8ca8b37be227eac089;甚至可能是VACUUM ANALYZE,但这不能在事务内部运行。
对于您的用例来说,实际的temporary table(CREATE TEMP TABLE ...)似乎是更好的选择。(而其他会话不需要查看表的相同状态。)整体表现更佳。但值得一提的是,它们完全不受自动真空分析的影响。请参见:
如果做不到这一点,Postgres将基于核心表中一个完全误导的行估计,尝试各种不合适的查询计划。Postgres选择不同的查询计划有很多可能的原因--任何改变成本估算的方法。但要解决这个问题,眼前的问题很可能就会消失。
Postgres查询计划中的“成本”是估计时间(在任意单位中)。
“实际时间”是指事后测量的时间。
Postgres总是倾向于降低成本。这就是它决定选择哪一个计划的方式。与极光没有任何关系。你的主要问题是误导性统计手册中的细节。启动这里和这里
也就是说,我会简化查询。CTE没有为查询添加任何有用的东西--物化与否。把它移开。
SELECT DISTINCT m.acp_id,
COALESCE(o.team, m.team_responsible)
FROM temp_table_de3398bacb6c4e8ca8b37be227eac089 t
JOIN public.f1_folio_milestones m USING (acp_id)
LEFT JOIN public.sa_milestone_overrides o USING (milestone, view, acp_id);USING只是节省打字的方便。不影响性能。只在适用的情况下使用。
考虑到基数(在t中最多为5000行,而在m中为1700万行),我将使用一个横向子查询来尝试这个替代查询:
SELECT t.acp_id, om.team
FROM temp_table_de3398bacb6c4e8ca8b37be227eac089 t
CROSS JOIN LATERAL (
SELECT COALESCE(o.team, m.team_responsible) AS team
FROM public.f1_folio_milestones m
LEFT JOIN public.sa_milestone_overrides o USING (milestone, view, acp_id)
WHERE m.acp_id = t.acp_id
) om;有了良好的数据分布和拟合指标,它可能会更快。进一步阅读(推荐的话,如果你想优化!):
https://dba.stackexchange.com/questions/260429
复制相似问题