表raw_data有一个索引ix_raw_data_timestamp
CREATE TABLE IF NOT EXISTS public.raw_data
(
ts timestamp without time zone NOT NULL,
log_msg character varying COLLATE pg_catalog."default",
log_image bytea
)
CREATE INDEX IF NOT EXISTS ix_raw_data_timestamp
ON public.raw_data USING btree
(ts ASC NULLS LAST)
TABLESPACE pg_default;由于某些原因,索引不用于以下查询(因此非常慢):
SELECT ts,
log_msg
FROM raw_data
ORDER BY ts ASC
LIMIT 5e6;以上查询的EXPLAIN (analyze, buffers, format text)结果:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=9752787.07..10336161.14 rows=5000000 width=50) (actual time=789124.600..859046.614 rows=5000000 loops=1)
Buffers: shared hit=12234 read=888521, temp read=2039471 written=2664654
-> Gather Merge (cost=9752787.07..18421031.89 rows=74294054 width=50) (actual time=789085.442..822547.099 rows=5000000 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=12234 read=888521, temp read=2039471 written=2664654
-> Sort (cost=9751787.05..9844654.62 rows=37147027 width=50) (actual time=788203.880..795491.054 rows=1667070 loops=3)
Sort Key: "ts"
Sort Method: external merge Disk: 1758904kB
Worker 0: Sort Method: external merge Disk: 1762872kB
Worker 1: Sort Method: external merge Disk: 1756216kB
Buffers: shared hit=12234 read=888521, temp read=2039471 written=2664654
-> Parallel Seq Scan on raw_data (cost=0.00..1272131.27 rows=37147027 width=50) (actual time=25.436..119352.861 rows=29717641 loops=3)
Buffers: shared hit=12141 read=888520
Planning Time: 5.240 ms
JIT:
Functions: 7
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 0.578 ms, Inlining 76.678 ms, Optimization 24.578 ms, Emission 13.060 ms, Total 114.894 ms
Execution Time: 877489.531 ms
(20 rows)但它被用于这一种:
SELECT ts,
log_msg
FROM raw_data
ORDER BY ts ASC
LIMIT 4e6;以上查询的EXPLAIN (analyze, buffers, format text)是:
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.57..9408157.15 rows=4000000 width=50) (actual time=15.081..44747.127 rows=4000000 loops=1)
Buffers: shared hit=24775 read=61155
-> Index Scan using ix_raw_data_timestamp on raw_data (cost=0.57..209691026.73 rows=89152864 width=50) (actual time=2.218..16077.755 rows=4000000 loops=1)
Buffers: shared hit=24775 read=61155
Planning Time: 1.306 ms
JIT:
Functions: 3
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 0.406 ms, Inlining 1.121 ms, Optimization 7.917 ms, Emission 3.721 ms, Total 13.165 ms
Execution Time: 59028.951 ms
(10 rows)不用说,目标是让所有查询使用索引,不管大小如何,但我似乎无法找到解决方案。
PS:
89152922行.编辑:
在将内存增加到2G (SET work_mem = '2GB';)之后,查询速度稍微快一些(不再使用磁盘),但速度仍然没有那么快:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=5592250.54..6175624.61 rows=5000000 width=50) (actual time=215887.445..282393.743 rows=5000000 loops=1)
Buffers: shared hit=12224 read=888531
-> Gather Merge (cost=5592250.54..14260731.75 rows=74296080 width=50) (actual time=215874.072..247030.062 rows=5000000 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=12224 read=888531
-> Sort (cost=5591250.52..5684120.62 rows=37148040 width=50) (actual time=215854.323..221828.921 rows=1667147 loops=3)
Sort Key: "ts"
Sort Method: top-N heapsort Memory: 924472kB
Worker 0: Sort Method: top-N heapsort Memory: 924379kB
Worker 1: Sort Method: top-N heapsort Memory: 924281kB
Buffers: shared hit=12224 read=888531
-> Parallel Seq Scan on raw_data (cost=0.00..1272141.40 rows=37148040 width=50) (actual time=25.899..107034.903 rows=29717641 loops=3)
Buffers: shared hit=12130 read=888531
Planning Time: 0.058 ms
JIT:
Functions: 7
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 0.642 ms, Inlining 53.860 ms, Optimization 23.848 ms, Emission 11.768 ms, Total 90.119 ms
Execution Time: 300281.654 ms
(20 rows)发布于 2021-11-24 23:49:30
这里的问题是,您将并行SEQ扫描和GATHER_MERGE。道集合并包含74,294,054行,输出5,000,000行。这是有意义的,因为您说DB中有89,152,922行,您没有条件限制它们。
为什么它会选择这个计划,可能是因为它正在强制物化,因为您已经超过了work_mem。所以增加你的work_mem。如果PostgreSQL认为它可以在内存中容纳所有这些,并且不需要在磁盘上这样做,那么它的移动速度将大大加快。
https://stackoverflow.com/questions/70101760
复制相似问题