文章/答案/技术大牛

发布

社区首页 >问答首页 >PostgreSQL不对非常大的表使用索引

问PostgreSQL不对非常大的表使用索引
EN

Stack Overflow用户

提问于 2021-11-24 19:16:13

回答 1查看 157关注 0票数 1

表raw_data有一个索引ix_raw_data_timestamp

CREATE TABLE IF NOT EXISTS public.raw_data
(
    ts timestamp without time zone NOT NULL,
    log_msg character varying COLLATE pg_catalog."default",
    log_image bytea
)
CREATE INDEX IF NOT EXISTS ix_raw_data_timestamp
    ON public.raw_data USING btree
    (ts ASC NULLS LAST)
    TABLESPACE pg_default;

由于某些原因，索引不用于以下查询(因此非常慢)：

SELECT ts,
    log_msg
FROM raw_data
ORDER BY ts ASC
LIMIT 5e6;

以上查询的EXPLAIN (analyze, buffers, format text)结果：

                                                                       QUERY PLAN                                                                       
--------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=9752787.07..10336161.14 rows=5000000 width=50) (actual time=789124.600..859046.614 rows=5000000 loops=1)
   Buffers: shared hit=12234 read=888521, temp read=2039471 written=2664654
   ->  Gather Merge  (cost=9752787.07..18421031.89 rows=74294054 width=50) (actual time=789085.442..822547.099 rows=5000000 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared hit=12234 read=888521, temp read=2039471 written=2664654
         ->  Sort  (cost=9751787.05..9844654.62 rows=37147027 width=50) (actual time=788203.880..795491.054 rows=1667070 loops=3)
               Sort Key: "ts"
               Sort Method: external merge  Disk: 1758904kB
               Worker 0:  Sort Method: external merge  Disk: 1762872kB
               Worker 1:  Sort Method: external merge  Disk: 1756216kB
               Buffers: shared hit=12234 read=888521, temp read=2039471 written=2664654
               ->  Parallel Seq Scan on raw_data  (cost=0.00..1272131.27 rows=37147027 width=50) (actual time=25.436..119352.861 rows=29717641 loops=3)
                     Buffers: shared hit=12141 read=888520
 Planning Time: 5.240 ms
 JIT:
   Functions: 7
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 0.578 ms, Inlining 76.678 ms, Optimization 24.578 ms, Emission 13.060 ms, Total 114.894 ms
 Execution Time: 877489.531 ms
(20 rows)

但它被用于这一种：

SELECT ts,
    log_msg
FROM raw_data
ORDER BY ts ASC
LIMIT 4e6;

以上查询的EXPLAIN (analyze, buffers, format text)是：

                                                                           QUERY PLAN                                                                           
----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.57..9408157.15 rows=4000000 width=50) (actual time=15.081..44747.127 rows=4000000 loops=1)
   Buffers: shared hit=24775 read=61155
   ->  Index Scan using ix_raw_data_timestamp on raw_data  (cost=0.57..209691026.73 rows=89152864 width=50) (actual time=2.218..16077.755 rows=4000000 loops=1)
         Buffers: shared hit=24775 read=61155
 Planning Time: 1.306 ms
 JIT:
   Functions: 3
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 0.406 ms, Inlining 1.121 ms, Optimization 7.917 ms, Emission 3.721 ms, Total 13.165 ms
 Execution Time: 59028.951 ms
(10 rows)

不用说，目标是让所有查询使用索引，不管大小如何，但我似乎无法找到解决方案。

PS：

关于数据库中的89152922行.

编辑：

在将内存增加到2G (SET work_mem = '2GB';)之后，查询速度稍微快一些(不再使用磁盘)，但速度仍然没有那么快：

                                                                       QUERY PLAN                                                                       
--------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=5592250.54..6175624.61 rows=5000000 width=50) (actual time=215887.445..282393.743 rows=5000000 loops=1)
   Buffers: shared hit=12224 read=888531
   ->  Gather Merge  (cost=5592250.54..14260731.75 rows=74296080 width=50) (actual time=215874.072..247030.062 rows=5000000 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared hit=12224 read=888531
         ->  Sort  (cost=5591250.52..5684120.62 rows=37148040 width=50) (actual time=215854.323..221828.921 rows=1667147 loops=3)
               Sort Key: "ts"
               Sort Method: top-N heapsort  Memory: 924472kB
               Worker 0:  Sort Method: top-N heapsort  Memory: 924379kB
               Worker 1:  Sort Method: top-N heapsort  Memory: 924281kB
               Buffers: shared hit=12224 read=888531
               ->  Parallel Seq Scan on raw_data  (cost=0.00..1272141.40 rows=37148040 width=50) (actual time=25.899..107034.903 rows=29717641 loops=3)
                     Buffers: shared hit=12130 read=888531
 Planning Time: 0.058 ms
 JIT:
   Functions: 7
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 0.642 ms, Inlining 53.860 ms, Optimization 23.848 ms, Emission 11.768 ms, Total 90.119 ms
 Execution Time: 300281.654 ms
(20 rows)

postgresql

回答 1

Stack Overflow用户

发布于 2021-11-24 23:49:30

这里的问题是，您将并行SEQ扫描和GATHER_MERGE。道集合并包含74,294,054行，输出5,000,000行。这是有意义的，因为您说DB中有89,152,922行，您没有条件限制它们。

为什么它会选择这个计划，可能是因为它正在强制物化，因为您已经超过了work_mem。所以增加你的work_mem。如果PostgreSQL认为它可以在内存中容纳所有这些，并且不需要在磁盘上这样做，那么它的移动速度将大大加快。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/70101760

复制

相似问题

问PostgreSQL不对非常大的表使用索引
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问PostgreSQL不对非常大的表使用索引EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问PostgreSQL不对非常大的表使用索引
EN