首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >PosgtreSQL GIN + BTree序

PosgtreSQL GIN + BTree序
EN

Database Administration用户
提问于 2017-11-22 16:10:39
回答 1查看 1.3K关注 0票数 3

我有一个结构简单的表(id,metadata_json,stamp),stamp是一个时间戳,上面有Btree索引。MetadataJson是一个带有GIN索引的jsonb。

这张桌子有2500万行。我正在使用PostgreSQL 10。

代码语言:javascript
复制
                 Table "public.metadata"
    Column     |            Type             | Modifiers
---------------+-----------------------------+-----------
 id            | uuid                        | not null
 metadata_json | jsonb                       |
 stamp         | timestamp without time zone |
Indexes:
    "metadata_pkey" PRIMARY KEY, btree (id)
    "metadata_idx" gin (metadata_json)
    "stamp_idx" btree (stamp)

我正在执行的查询非常简单:

代码语言:javascript
复制
select * from metadata where metadata_json @> '{"someBool": true}'
         AND stamp >= '01-01-2016' ORDER BY stamp DESC LIMIT 100;

我认为它应该如何工作:我在邮票上有一个btree,因此它应该以反向顺序按索引进行,然后它应该测试json限制上的行(它有40%的选择性)。我希望它能在几毫秒内返回。

代码语言:javascript
复制
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=80598.46..80598.71 rows=100 width=381) (actual time=445064.728..445064.791 rows=100 loops=1)
   ->  Sort  (cost=80598.46..80607.46 rows=3600 width=381) (actual time=445064.724..445064.754 rows=100 loops=1)
         Sort Key: stamp DESC
         Sort Method: top-N heapsort  Memory: 109kB
         ->  Bitmap Heap Scan on metadata  (cost=66591.00..80460.87 rows=3600 width=381) (actual time=2881.164..444283.520 rows=1437024 loops=1)
               Recheck Cond: ((metadata_json @> '{"someBool": true}'::jsonb) AND (stamp >= '2016-01-01 00:00:00'::timestamp without time zone))
               Heap Blocks: exact=882439
               ->  BitmapAnd  (cost=66591.00..66591.00 rows=3600 width=0) (actual time=2599.415..2599.415 rows=0 loops=1)
                     ->  Bitmap Index Scan on metadata_idx  (cost=0.00..260.25 rows=25100 width=0) (actual time=1762.166..1762.166 rows=10041746 loops=1)
                           Index Cond: (metadata_json @> '{"someBool": true}'::jsonb)
                     ->  Bitmap Index Scan on stamp_idx  (cost=0.00..66328.69 rows=3600034 width=0) (actual time=760.136..760.136 rows=3591329 loops=1)
                           Index Cond: (stamp >= '2016-01-01 00:00:00'::timestamp without time zone)
 Planning time: 5.008 ms
 Execution time: 445072.043 ms
(14 rows)

从计划上看,规划者的统计数据确实有偏差,但对表进行了分析,抽样设置为1000。

编辑:经过一些搜索,我发现postgres没有jsonb数据类型的统计数据.您能告诉我这种查询是如何优化的吗?

编辑2:如果我禁用位图扫描,查询非常快(2 millis)。但我不认为这是个好办法..。

编辑3:击剑(使用CTE语句)

代码语言:javascript
复制
Limit  (cost=1489968.48..1489968.73 rows=100 width=56) (actual time=447543.199..447543.262 rows=100 loops=1)
  CTE t
    ->  Bitmap Heap Scan on metadata  (cost=67228.70..1408830.13 rows=3600034 width=381) (actual time=1045.566..441897.315 rows=3591329 loops=1)
          Recheck Cond: (stamp >= '2016-01-01 00:00:00'::timestamp without time zone)
          Heap Blocks: exact=1229457
          ->  Bitmap Index Scan on stamp_idx  (cost=0.00..66328.69 rows=3600034 width=0) (actual time=663.960..663.960 rows=3591329 loops=1)
                Index Cond: (stamp >= '2016-01-01 00:00:00'::timestamp without time zone)
  ->  Sort  (cost=81138.35..81147.35 rows=3600 width=56) (actual time=447543.197..447543.227 rows=100 loops=1)
        Sort Key: t.stamp DESC
        Sort Method: top-N heapsort  Memory: 109kB
        ->  CTE Scan on t  (cost=0.00..81000.76 rows=3600 width=56) (actual time=1045.577..446935.261 rows=1437024 loops=1)
              Filter: (metadata_json @> '{"someBool": true}'::jsonb)
              Rows Removed by Filter: 2154305
Planning time: 0.169 ms
Execution time: 447692.843 ms

编辑4:击剑(从中选择)

代码语言:javascript
复制
Limit  (cost=1798933.42..1811851.02 rows=100 width=381) (actual time=198282.400..198282.400 rows=0 loops=1)
  ->  Subquery Scan on foo  (cost=1798933.42..2186461.48 rows=3000 width=381) (actual time=198282.397..198282.397 rows=0 loops=1)
        Filter: (foo.metadata_json @> '{"someBool": true}'::jsonb)
        Rows Removed by Filter: 3591329
        ->  Gather Merge  (cost=1798933.42..2148961.13 rows=3000028 width=381) (actual time=184803.964..195869.763 rows=3591329 loops=1)
              Workers Planned: 2
              Workers Launched: 2
              ->  Sort  (cost=1797933.40..1801683.43 rows=1500014 width=381) (actual time=184599.426..188532.836 rows=1197110 loops=3)
                    Sort Key: metadata.stamp DESC
                    Sort Method: external merge  Disk: 461368kB
                    ->  Parallel Bitmap Heap Scan on metadata  (cost=67228.70..1382579.88 rows=1500014 width=381) (actual time=1171.006..178501.269 rows=1197110 loops=3)
                          Recheck Cond: (stamp >= '2016-01-01 00:00:00'::timestamp without time zone)
                          Heap Blocks: exact=408005
                          ->  Bitmap Index Scan on stamp_idx  (cost=0.00..66328.69 rows=3600034 width=0) (actual time=728.401..728.401 rows=3591329 loops=1)
                                Index Cond: (stamp >= '2016-01-01 00:00:00'::timestamp without time zone)
Planning time: 6.704 ms
Execution time: 198509.456 ms

强制禁用位图扫描

代码语言:javascript
复制
set enable_bitmapscan = off;
explain analyze select * from metadata where metadata_json @> '{"someBool": true}' AND stamp >= '01-01-2015' ORDER BY stamp DESC LIMIT 100;

Limit  (cost=0.44..256064.27 rows=100 width=381) (actual time=0.065..1.814 rows=100 loops=1)
  ->  Index Scan Backward using stamp_idx on metadata  (cost=0.44..18423793.42 rows=7195 width=381) (actual time=0.064..1.777 rows=100 loops=1)
        Index Cond: (stamp >= '2015-01-01 00:00:00'::timestamp without time zone)
        Filter: (metadata_json @> '{"someBool": true}'::jsonb)
        Rows Removed by Filter: 126
Planning time: 0.180 ms
Execution time: 1.856 ms

编辑5只复合杜松子酒(邮票,metadata_json)索引显示:

代码语言:javascript
复制
explain analyze select * from metadata where metadata_json @> '{"someBool": true}
         AND stamp >= '01-01-2016' ORDER BY stamp DESC LIMIT 100;
Limit  (cost=14132.36..14132.61 rows=100 width=381) (actual time=308836.991..308837.052 rows=100 loops=1)
  ->  Sort  (cost=14132.36..14141.36 rows=3600 width=381) (actual time=308836.988..308837.018 rows=100 loops=1)
        Sort Key: stamp DESC
        Sort Method: top-N heapsort  Memory: 109kB
        ->  Bitmap Heap Scan on metadata  (cost=124.90..13994.77 rows=3600 width=381) (actual time=3160.418..308183.328 rows=1437024 loops=1)
              Recheck Cond: ((stamp >= '2016-01-01 00:00:00'::timestamp without time zone) AND (metadata_json @> '{"someBool": true}'::jsonb))
              Heap Blocks: exact=882439
              ->  Bitmap Index Scan on metadata_stamp_metadata_json_idx  (cost=0.00..124.00 rows=3600 width=0) (actual time=2883.484..2883.484 rows=1437024 loops=1)
                    Index Cond: ((stamp >= '2016-01-01 00:00:00'::timestamp without time zone) AND (metadata_json @> '{"someBool": true}'::jsonb))
Planning time: 0.233 ms
Execution time: 308857.051 ms

最终解决方案:

我已经将json分解为键值,并将其存储为表"recordId,key,value,stamp“。我为这些创建了一个btree索引,结果在几个millis中得到了普遍的返回。我认为,没有统计数据,json没有任何好的通用解决方案。

正确的答案是Evan,因为这可能是在jsonb结构中所能做的最好的事情。

EN

回答 1

Database Administration用户

回答已采纳

发布于 2017-11-22 18:57:21

你现在所面临的真正问题是jsonb的数据太烂了。这是一个众所周知的问题。它也永远不会修好。

代码语言:javascript
复制
->  Bitmap Index Scan on metadata_idx  (cost=0.00..260.25 rows=25100 width=0) (actual time=1762.166..1762.166 rows=10041746 loops=1)
    Index Cond: (metadata_json @> '{"someBool": true}'::jsonb)

在这里,PostgreSQL预计25100,但低选择性将返回10041746。时间戳估计相当准确,返回3.6M,PostgreSQL预计需要挖掘3.6M才能找到仅有的25,100。这是很大的挖掘。因此,它会对jsonb进行索引扫描。

你有几个选择。

  • 按照ypercube的建议创建复合杜松子酒索引。创建扩展btree_gin;使用gin创建元数据索引(stamp,metadata_json);
  • 添加元数据_json->someBool的索引
  • 使用优化栅栏。选择* FROM (从邮票为>= '01-01-2016‘的元数据中选择metadata_json @> '{"someBool":true}’按邮戳限制为100的顺序;

您可能还想了解一下jsonb_path_ops。它也可以用来创建复合杜松子酒索引,

代码语言:javascript
复制
CREATE INDEX ON metadata USING gin(stamp, metadata_json jsonb_path_ops);
票数 1
EN
页面原文内容由Database Administration提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://dba.stackexchange.com/questions/191513

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档