首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >PostgreSQL不对非常大的表使用索引

PostgreSQL不对非常大的表使用索引
EN

Stack Overflow用户
提问于 2021-11-24 19:16:13
回答 1查看 157关注 0票数 1

raw_data有一个索引ix_raw_data_timestamp

代码语言:javascript
复制
CREATE TABLE IF NOT EXISTS public.raw_data
(
    ts timestamp without time zone NOT NULL,
    log_msg character varying COLLATE pg_catalog."default",
    log_image bytea
)
CREATE INDEX IF NOT EXISTS ix_raw_data_timestamp
    ON public.raw_data USING btree
    (ts ASC NULLS LAST)
    TABLESPACE pg_default;

由于某些原因,索引不用于以下查询(因此非常慢):

代码语言:javascript
复制
SELECT ts,
    log_msg
FROM raw_data
ORDER BY ts ASC
LIMIT 5e6;

以上查询的EXPLAIN (analyze, buffers, format text)结果:

代码语言:javascript
复制
                                                                       QUERY PLAN                                                                       
--------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=9752787.07..10336161.14 rows=5000000 width=50) (actual time=789124.600..859046.614 rows=5000000 loops=1)
   Buffers: shared hit=12234 read=888521, temp read=2039471 written=2664654
   ->  Gather Merge  (cost=9752787.07..18421031.89 rows=74294054 width=50) (actual time=789085.442..822547.099 rows=5000000 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared hit=12234 read=888521, temp read=2039471 written=2664654
         ->  Sort  (cost=9751787.05..9844654.62 rows=37147027 width=50) (actual time=788203.880..795491.054 rows=1667070 loops=3)
               Sort Key: "ts"
               Sort Method: external merge  Disk: 1758904kB
               Worker 0:  Sort Method: external merge  Disk: 1762872kB
               Worker 1:  Sort Method: external merge  Disk: 1756216kB
               Buffers: shared hit=12234 read=888521, temp read=2039471 written=2664654
               ->  Parallel Seq Scan on raw_data  (cost=0.00..1272131.27 rows=37147027 width=50) (actual time=25.436..119352.861 rows=29717641 loops=3)
                     Buffers: shared hit=12141 read=888520
 Planning Time: 5.240 ms
 JIT:
   Functions: 7
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 0.578 ms, Inlining 76.678 ms, Optimization 24.578 ms, Emission 13.060 ms, Total 114.894 ms
 Execution Time: 877489.531 ms
(20 rows)

但它被用于这一种:

代码语言:javascript
复制
SELECT ts,
    log_msg
FROM raw_data
ORDER BY ts ASC
LIMIT 4e6;

以上查询的EXPLAIN (analyze, buffers, format text)是:

代码语言:javascript
复制
                                                                           QUERY PLAN                                                                           
----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.57..9408157.15 rows=4000000 width=50) (actual time=15.081..44747.127 rows=4000000 loops=1)
   Buffers: shared hit=24775 read=61155
   ->  Index Scan using ix_raw_data_timestamp on raw_data  (cost=0.57..209691026.73 rows=89152864 width=50) (actual time=2.218..16077.755 rows=4000000 loops=1)
         Buffers: shared hit=24775 read=61155
 Planning Time: 1.306 ms
 JIT:
   Functions: 3
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 0.406 ms, Inlining 1.121 ms, Optimization 7.917 ms, Emission 3.721 ms, Total 13.165 ms
 Execution Time: 59028.951 ms
(10 rows)

不用说,目标是让所有查询使用索引,不管大小如何,但我似乎无法找到解决方案。

PS:

  • 关于数据库中的89152922行.

编辑:

在将内存增加到2G (SET work_mem = '2GB';)之后,查询速度稍微快一些(不再使用磁盘),但速度仍然没有那么快:

代码语言:javascript
复制
                                                                       QUERY PLAN                                                                       
--------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=5592250.54..6175624.61 rows=5000000 width=50) (actual time=215887.445..282393.743 rows=5000000 loops=1)
   Buffers: shared hit=12224 read=888531
   ->  Gather Merge  (cost=5592250.54..14260731.75 rows=74296080 width=50) (actual time=215874.072..247030.062 rows=5000000 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared hit=12224 read=888531
         ->  Sort  (cost=5591250.52..5684120.62 rows=37148040 width=50) (actual time=215854.323..221828.921 rows=1667147 loops=3)
               Sort Key: "ts"
               Sort Method: top-N heapsort  Memory: 924472kB
               Worker 0:  Sort Method: top-N heapsort  Memory: 924379kB
               Worker 1:  Sort Method: top-N heapsort  Memory: 924281kB
               Buffers: shared hit=12224 read=888531
               ->  Parallel Seq Scan on raw_data  (cost=0.00..1272141.40 rows=37148040 width=50) (actual time=25.899..107034.903 rows=29717641 loops=3)
                     Buffers: shared hit=12130 read=888531
 Planning Time: 0.058 ms
 JIT:
   Functions: 7
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 0.642 ms, Inlining 53.860 ms, Optimization 23.848 ms, Emission 11.768 ms, Total 90.119 ms
 Execution Time: 300281.654 ms
(20 rows)
EN

回答 1

Stack Overflow用户

发布于 2021-11-24 23:49:30

这里的问题是,您将并行SEQ扫描和GATHER_MERGE。道集合并包含74,294,054行,输出5,000,000行。这是有意义的,因为您说DB中有89,152,922行,您没有条件限制它们。

为什么它会选择这个计划,可能是因为它正在强制物化,因为您已经超过了work_mem。所以增加你的work_mem。如果PostgreSQL认为它可以在内存中容纳所有这些,并且不需要在磁盘上这样做,那么它的移动速度将大大加快。

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/70101760

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档