首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >添加索引会减慢PostgreSQL中的查询速度,但在SQLite中则相反

添加索引会减慢PostgreSQL中的查询速度,但在SQLite中则相反
EN

Stack Overflow用户
提问于 2022-09-29 14:01:13
回答 1查看 73关注 0票数 1

我的PostgreSQL数据库中有以下表(v13.8,运行在Debian11(Bullseye)上),其中包含大约15,000,000行:

代码语言:javascript
复制
=> \d ncbitaxon
               Table "public.ncbitaxon"
   Column   |  Type   | Collation | Nullable | Default 
------------+---------+-----------+----------+---------
 assertion  | integer |           |          | 
 retraction | integer |           |          | 0
 graph      | text    |           |          | 
 subject    | text    |           |          | 
 predicate  | text    |           |          | 
 object     | text    |           |          | 
 datatype   | text    |           |          | 
 annotation | text    |           |          | 

该表没有主键,因此每个subject字段都可以与多个objectpredicate字段关联。我想检索与predicate 'rdf:type‘和object 'owl:Class’相关联的每个主题,但这与predicate 'rdfs:subClassOf‘没有关联。

该表没有索引,当我运行以下查询时,在2s (我认为是可以接受的)内一致地返回一个答案:

代码语言:javascript
复制
select n1.subject
  from ncbitaxon n1
 where n1.predicate = 'rdf:type'
   and n1.object = 'owl:Class'
   and not exists (
     select 1
       from ncbitaxon n2
      where n2.subject = n1.subject
        and n2.predicate = 'rdfs:subClassOf'
   )

但是,当我在每个subjectobjectpredicatedatatype列上添加(非唯一) btree索引时,性能会大大降低,因此我的结果在大约9s左右(这太慢)就会一致返回。

我意识到索引不是灵丹妙药,有时甚至可以(很明显)减慢选择查询的速度。

但令我困惑的是,当我在同一个表上运行相同的查询时,但这次在SQLite (运行在同一台膝上型计算机上的3.34.1版)中,我的效果正好相反。也就是说,使用这些索引,我在大约5s内得到了一个结果,如果没有索引,我最终不得不按Ctrl,因为我厌倦了等待结果返回。

我想知道这是否是某种缓存效果所致,所以我尝试通过运行以下命令来清除PostgreSQL的缓存:

代码语言:javascript
复制
echo 1 > /proc/sys/vm/drop_caches
echo 2 > /proc/sys/vm/drop_caches
echo 3 > /proc/sys/vm/drop_caches
systemctl restart postgresql

而且确实存在一些缓存效应,因为我第一次在PostgreSQL中运行查询时,大约需要3s (而不是2s)。但是,如果有这些索引,这仍然比在SQLite (~5s)或PostgreSQL (~9s)中运行查询更快。

我很困惑。有人知道这是怎么回事吗?

create语句如下:

代码语言:javascript
复制
create index idx_77907_idx_ncbitaxon_predicate on ncbitaxon (predicate);
create index idx_77907_idx_ncbitaxon_subject on ncbitaxon (subject);
create index idx_77907_idx_ncbitaxon_object on ncbitaxon (object);
create index idx_77907_idx_ncbitaxon_datatype on ncbitaxon (datatype);

下面是with索引情况下的explain (analyze, buffers, format text)输出:

代码语言:javascript
复制
                                                                           QUERY PLAN                                                                            
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.56..549629.43 rows=1 width=17) (actual time=5354.149..7223.752 rows=2 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   Buffers: shared hit=12556526 read=226206
   ->  Nested Loop Anti Join  (cost=0.56..548629.33 rows=1 width=17) (actual time=5942.986..7184.393 rows=1 loops=3)
         Buffers: shared hit=12556526 read=226206
         ->  Parallel Seq Scan on ncbitaxon n1  (cost=0.00..295443.22 rows=168032 width=17) (actual time=137.371..630.607 rows=812952 loops=3)
               Filter: ((predicate = 'rdf:type'::text) AND (object = 'owl:Class'::text))
               Rows Removed by Filter: 4250687
               Buffers: shared hit=6214 read=194286
         ->  Index Scan using idx_77907_idx_ncbitaxon_subject on ncbitaxon n2  (cost=0.56..3.46 rows=5 width=17) (actual time=0.008..0.008 rows=1 loops=2438855)
               Index Cond: (subject = n1.subject)
               Filter: (predicate = 'rdfs:subClassOf'::text)
               Rows Removed by Filter: 4
               Buffers: shared hit=12550312 read=31920
 Planning:
   Buffers: shared hit=14 read=6
 Planning Time: 1.054 ms
 JIT:
   Functions: 27
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 6.551 ms, Inlining 111.762 ms, Optimization 182.297 ms, Emission 117.506 ms, Total 418.115 ms
 Execution Time: 7228.950 ms
(23 rows)

这里是关于无索引的情况:

代码语言:javascript
复制
------------------------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=299030.49..603580.05 rows=1 width=17) (actual time=1461.791..1500.212 rows=2 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   Buffers: shared hit=25757 read=375369, temp read=22444 written=22996
   ->  Parallel Hash Anti Join  (cost=298030.49..602579.95 rows=1 width=17) (actual time=1425.751..1432.604 rows=1 loops=3)
         Hash Cond: (n1.subject = n2.subject)
         Buffers: shared hit=25757 read=375369, temp read=22444 written=22996
         ->  Parallel Seq Scan on ncbitaxon n1  (cost=0.00..295443.22 rows=168032 width=17) (actual time=0.021..447.123 rows=812952 loops=3)
               Filter: ((predicate = 'rdf:type'::text) AND (object = 'owl:Class'::text))
               Rows Removed by Filter: 4250687
               Buffers: shared hit=12868 read=187632
         ->  Parallel Hash  (cost=279619.35..279619.35 rows=1002811 width=17) (actual time=694.169..694.170 rows=812951 loops=3)
               Buckets: 65536  Batches: 64  Memory Usage: 2624kB
               Buffers: shared hit=12772 read=187728, temp written=11456
               ->  Parallel Seq Scan on ncbitaxon n2  (cost=0.00..279619.35 rows=1002811 width=17) (actual time=139.958..552.807 rows=812951 loops=3)
                     Filter: (predicate = 'rdfs:subClassOf'::text)
                     Rows Removed by Filter: 4250687
                     Buffers: shared hit=12772 read=187728
 Planning:
   Buffers: shared hit=5 dirtied=1
 Planning Time: 0.582 ms
 JIT:
   Functions: 39
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 6.349 ms, Inlining 93.709 ms, Optimization 198.947 ms, Emission 126.610 ms, Total 425.615 ms
 Execution Time: 1504.890 ms
(26 rows)

explain在SQLite (有索引)中的输出是:

代码语言:javascript
复制
addr  opcode         p1    p2    p3    p4             p5  comment      
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     27    0                    0   Start at 27  
1     OpenRead       0     34019  0     6              0   root=34019 iDb=0; ncbitaxon
2     OpenRead       2     519613  0     k(2,,)         2   root=519613 iDb=0; idx_ncbitaxon_object
3     String8        0     1     0     owl:Class      0   r[1]='owl:Class'
4     SeekGE         2     26    1     1              0   key=r[1]     
5       IdxGT          2     26    1     1              0   key=r[1]     
6       DeferredSeek   2     0     0                    0   Move 0 to 2.rowid if needed
7       Column         0     4     2                    0   r[2]=ncbitaxon.predicate
8       Ne             3     25    2     BINARY-8       82  if r[2]!=r[3] goto 25
9       Integer        0     4     0                    0   r[4]=0; Init EXISTS result
10      Integer        1     5     0                    0   r[5]=1; LIMIT counter
11      OpenRead       1     34019  0     5              0   root=34019 iDb=0; ncbitaxon
12      OpenRead       3     332263  0     k(2,,)         2   root=332263 iDb=0; idx_ncbitaxon_subject
13      Column         0     3     6                    0   r[6]=ncbitaxon.subject
14      SeekGE         3     22    6     1              0   key=r[6]     
15        IdxGT          3     22    6     1              0   key=r[6]     
16        DeferredSeek   3     0     1                    0   Move 1 to 3.rowid if needed
17        Column         1     4     7                    0   r[7]=ncbitaxon.predicate
18        Ne             8     21    7     BINARY-8       82  if r[7]!=r[8] goto 21
19        Integer        1     4     0                    0   r[4]=1       
20        DecrJumpZero   5     22    0                    0   if (--r[5])==0 goto 22
21      Next           3     15    1                    0                
22      If             4     25    1                    0                
23      Column         0     3     10                   0   r[10]=ncbitaxon.subject
24      ResultRow      10    1     0                    0   output=r[10] 
25    Next           2     5     1                    0                
26    Halt           0     0     0                    0                
27    Transaction    0     0     77    0              1   usesStmtJournal=0
28    String8        0     3     0     rdf:type       0   r[3]='rdf:type'
29    String8        0     8     0     rdfs:subClassOf  0   r[8]='rdfs:subClassOf'
30    Goto           0     1     0                    0                

最后,下面是explain query plan在SQLite中的输出:

代码语言:javascript
复制
QUERY PLAN
|--SEARCH TABLE ncbitaxon AS n1 USING INDEX idx_ncbitaxon_object (object=?)
`--CORRELATED SCALAR SUBQUERY 1
   `--SEARCH TABLE ncbitaxon AS n2 USING INDEX idx_ncbitaxon_subject (subject=?)
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-09-29 16:07:37

我不能给出一个完整的答案,但以下是一些想法:

在“慢速计划”(

  • )中,PostgreSQL低估了ncbitaxon n1中发现的行数近5倍。尝试改进该估计值,方法是运行ANALYZE来收集新的统计数据,或者,如果不够,则通过为这些列的相关性创建扩展统计信息。

如果这个估计更准确,那么缓慢的计划将以更高的成本进行估计,而不会被选择。

  • 的缓慢计划中,ncbitaxon n2上的重复索引扫描比PostgreSQL估计的要昂贵得多。这在一定程度上是由于上述错误估计所致,但也许random_page_cost设置的值比您的硬件要低,因此PostgreSQL低估了索引扫描的成本。
票数 3
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/73896837

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档