首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >PostgreSQL对全文搜索查询的输出进行排序需要很长时间

PostgreSQL对全文搜索查询的输出进行排序需要很长时间
EN

Database Administration用户
提问于 2020-09-11 11:06:54
回答 1查看 134关注 0票数 1

我有一个有数百万行的表,其中我在WHERE子句中执行一个带有复杂全文搜索查询的D1(在一个全文搜索索引列上),并在最后使用ORDER BY id DESC。查询中有更多的部分,但我认为这是相关的部分。

最后的输出是10k行。

查询非常慢(~1.5分钟),但是如果我删除全文搜索或ORDER BY,它将在几秒钟内解析。

为什么会这样呢?有什么方法可以提高性能吗?

完整查询的EXPLAIN ANALYZE输出如下:

代码语言:javascript
复制
EXPLAIN ANALYZE SELECT 
* FROM "my_company" WHERE ("my_company"."company_type" = 2 AND "my_company"."id" IN (SELECT U0."company_id" FROM "my_companysource" U0 WHERE U0."source_id" IN (1,2,3,4,5,6)) AND NOT ("my_company"."id" IN (SELECT U0."id" FROM "my_company" U0 INNER JOIN "campaigns_processedurl" U1 ON (U0."id" = U1."company_id") INNER JOIN "campaigns_urlcampaign" U2 ON (U1."campaign_id" = U2."id") WHERE U2."project_id" IN (999))) AND ( ("my_company"."fulltext_search_index") @@ (to_tsquery('pg_catalog.english',
'(Trading)|(Bank)|(Banking)|(Finance)|(Financing)|(Transaction)|(Compliance)|(Insurance)|(Currencies)|(Currency)|(Credit <-> Cards)|(Financial <-> Services)|(Invoicing)|(Loans)|(Leasing)|(Cryptocurrency)|(Risk <-> Management)|(Stock <-> Exchanges)|(Asset <-> Management)|(Fintech)|(Incubators)|(Bitcoin)|(Lending)|(Point <-> Of <-> Sale)|(Cyber <-> Security)|(Trading <-> Platform)|(Smart <-> Contracts)|(Peer <-> To <-> Peer)|(Debit <-> Cards)|(Venture <-> Capital)|(Mortgages)|(P2P)|(Wealth <-> Management)|(Crowdfunding)|(Financial <-> Planning)|(Stock <-> Market)|(Life <-> Insurance)|(Mobile <-> Payments)|(Health <-> Insurance)|(Remittance)|(Real <-> Estate <-> Investment)|(Business <-> Financing)|(Investing <-> Services)|(Financial <-> Technology)|(Pos <-> Solution)|(Transaction <-> Processing)|(Private <-> Banking)|(Credit <-> Scoring)|(Hedge <-> Funds)|(Personal <-> Finance)|(Retail <-> Banking)|(Claim <-> Management)|(Fraud <-> Detection)|(Insurtech)|(Virtual <-> Currency)|(Angel <-> Investment)|(Crowd <-> Funding)|(Core <-> Banking)|(Digital <-> Bank)|(Fin <-> Tech)|(Consumer <-> Financing)|(Impact <-> Investing)|(P2P <-> Lending)|(Debt <-> Collections)|(Banking <-> Platform)|(Auto <-> Insurance)|(Property <-> Insurance)|(Commercial <-> Insurance)|(Prediction <-> Markets)|(Creditworthiness)|(Regtech)|(Financial <-> Exchanges)|(Consumer <-> Lending)|(Credit <-> Bureau)|(Banking <-> Api)|(Authentication <-> Technology)|(Micro <-> Lending)|(Microloans)|(Regulatory <-> Tech)|(Peer <-> To <-> Peer <-> Banking)|(Commercial <-> Lending)|(Financial <-> Management <-> Application)|(Microinvesting)|(P2P <-> Insurance)|(Reg <-> Tech)|(Bank <-> Clearance)|(Crowd <-> Financing)|(Crowdfinncing)|(Investment <-> Management <-> System)|(P2P <-> Banking)'
))) AND "my_company"."url_status" IN (1)) ORDER BY "my_company"."id" DESC;

QUERY PLAN
----------

 Sort  (cost=1246558.99..1246591.25 rows=12902 width=1600) (actual time=93420.927..93422.435 rows=10934 loops=1)
   Sort Key: my_company.id DESC
   Sort Method: quicksort  Memory: 21040kB
   ->  Hash Semi Join  (cost=511240.57..1245678.09 rows=12902 width=1600) (actual time=5997.812..93374.174 rows=10934 loops=1)
         Hash Cond: (my_company.id = u0.company_id)
         ->  Bitmap Heap Scan on my_company  (cost=499793.38..1233166.59 rows=197175 width=1600) (actual time=5397.255..92718.751 rows=106119 loops=1)
               Recheck Cond: ((fulltext_search_index @@ '
''trade'' | ''bank'' | ''bank'' | ''financ'' | ''financ'' | ''transact'' | ''complianc'' | ''insur'' | ''currenc'' | ''currenc'' | ''credit'' <-> ''card'' | ''financi'' <-> ''servic'' | ''invoic'' | ''loan'' | ''leas'' | ''cryptocurr'' | ''risk'' <-> ''manag'' | ''stock'' <-> ''exchang'' | ''asset'' <-> ''manag'' | ''fintech'' | ''incub'' | ''bitcoin'' | ''lend'' | ''point'' <2> ''sale'' | ''cyber'' <-> ''secur'' | ''trade'' <-> ''platform'' | ''smart'' <-> ''contract'' | ''peer'' <2> ''peer'' | ''debit'' <-> ''card'' | ''ventur'' <-> ''capit'' | ''mortgag'' | ''p2p'' | ''wealth'' <-> ''manag'' | ''crowdfund'' | ''financi'' <-> ''plan'' | ''stock'' <-> ''market'' | ''life'' <-> ''insur'' | ''mobil'' <-> ''payment'' | ''health'' <-> ''insur'' | ''remitt'' | ''real'' <-> ''estat'' <-> ''invest'' | ''busi'' <-> ''financ'' | ''invest'' <-> ''servic'' | ''financi'' <-> ''technolog'' | ''pos'' <-> ''solut'' | ''transact'' <-> ''process'' | ''privat'' <-> ''bank'' | ''credit'' <-> ''score'' | ''hedg'' <-> ''fund'' | ''person'' <-> ''financ'' | ''retail'' <-> ''bank'' | ''claim'' <-> ''manag'' | ''fraud'' <-> ''detect'' | ''insurtech'' | ''virtual'' <-> ''currenc'' | ''angel'' <-> ''invest'' | ''crowd'' <-> ''fund'' | ''core'' <-> ''bank'' | ''digit'' <-> ''bank'' | ''fin'' <-> ''tech'' | ''consum'' <-> ''financ'' | ''impact'' <-> ''invest'' | ''p2p'' <-> ''lend'' | ''debt'' <-> ''collect'' | ''bank'' <-> ''platform'' | ''auto'' <-> ''insur'' | ''properti'' <-> ''insur'' | ''commerci'' <-> ''insur'' | ''predict'' <-> ''market'' | ''creditworthi'' | ''regtech'' | ''financi'' <-> ''exchang'' | ''consum'' <-> ''lend'' | ''credit'' <-> ''bureau'' | ''bank'' <-> ''api'' | ''authent'' <-> ''technolog'' | ''micro'' <-> ''lend'' | ''microloan'' | ''regulatori'' <-> ''tech'' | ''peer'' <2> ''peer'' <-> ''bank'' | ''commerci'' <-> ''lend'' | ''financi'' <-> ''manag'' <-> ''applic'' | ''microinvest'' | ''p2p'' <-> ''insur'' | ''reg'' <-> ''tech'' | ''bank'' <-> ''clearanc'' | ''crowd'' <-> ''financ'' | ''crowdfinnc'' | ''invest'' <-> ''manag'' <-> ''system'' | ''p2p'' <-> ''bank''
'::tsquery) AND (company_type = 2) AND (url_status = 1))
               Rows Removed by Index Recheck: 54965
               Filter: (NOT (hashed SubPlan 1))
               Rows Removed by Filter: 57142
               Heap Blocks: exact=197083
               ->  BitmapAnd  (cost=94060.43..94060.43 rows=394349 width=0) (actual time=3605.515..3605.515 rows=0 loops=1)
                     ->  Bitmap Index Scan on my_company_main_fulltext  (cost=0.00..6414.92 rows=620827 width=0) (actual time=2419.704..2419.704 rows=352274 loops=1)
                           Index Cond: (fulltext_search_index @@ '
''trade'' | ''bank'' | ''bank'' | ''financ'' | ''financ'' | ''transact'' | ''complianc'' | ''insur'' | ''currenc'' | ''currenc'' | ''credit'' <-> ''card'' | ''financi'' <-> ''servic'' | ''invoic'' | ''loan'' | ''leas'' | ''cryptocurr'' | ''risk'' <-> ''manag'' | ''stock'' <-> ''exchang'' | ''asset'' <-> ''manag'' | ''fintech'' | ''incub'' | ''bitcoin'' | ''lend'' | ''point'' <2> ''sale'' | ''cyber'' <-> ''secur'' | ''trade'' <-> ''platform'' | ''smart'' <-> ''contract'' | ''peer'' <2> ''peer'' | ''debit'' <-> ''card'' | ''ventur'' <-> ''capit'' | ''mortgag'' | ''p2p'' | ''wealth'' <-> ''manag'' | ''crowdfund'' | ''financi'' <-> ''plan'' | ''stock'' <-> ''market'' | ''life'' <-> ''insur'' | ''mobil'' <-> ''payment'' | ''health'' <-> ''insur'' | ''remitt'' | ''real'' <-> ''estat'' <-> ''invest'' | ''busi'' <-> ''financ'' | ''invest'' <-> ''servic'' | ''financi'' <-> ''technolog'' | ''pos'' <-> ''solut'' | ''transact'' <-> ''process'' | ''privat'' <-> ''bank'' | ''credit'' <-> ''score'' | ''hedg'' <-> ''fund'' | ''person'' <-> ''financ'' | ''retail'' <-> ''bank'' | ''claim'' <-> ''manag'' | ''fraud'' <-> ''detect'' | ''insurtech'' | ''virtual'' <-> ''currenc'' | ''angel'' <-> ''invest'' | ''crowd'' <-> ''fund'' | ''core'' <-> ''bank'' | ''digit'' <-> ''bank'' | ''fin'' <-> ''tech'' | ''consum'' <-> ''financ'' | ''impact'' <-> ''invest'' | ''p2p'' <-> ''lend'' | ''debt'' <-> ''collect'' | ''bank'' <-> ''platform'' | ''auto'' <-> ''insur'' | ''properti'' <-> ''insur'' | ''commerci'' <-> ''insur'' | ''predict'' <-> ''market'' | ''creditworthi'' | ''regtech'' | ''financi'' <-> ''exchang'' | ''consum'' <-> ''lend'' | ''credit'' <-> ''bureau'' | ''bank'' <-> ''api'' | ''authent'' <-> ''technolog'' | ''micro'' <-> ''lend'' | ''microloan'' | ''regulatori'' <-> ''tech'' | ''peer'' <2> ''peer'' <-> ''bank'' | ''commerci'' <-> ''lend'' | ''financi'' <-> ''manag'' <-> ''applic'' | ''microinvest'' | ''p2p'' <-> ''insur'' | ''reg'' <-> ''tech'' | ''bank'' <-> ''clearanc'' | ''crowd'' <-> ''financ'' | ''crowdfinnc'' | ''invest'' <-> ''manag'' <-> ''system'' | ''p2p'' <-> ''bank''
'::tsquery)
                     ->  Bitmap Index Scan on my_company_url_status_2704ae13bae74399_uniq  (cost=0.00..87546.67 rows=2432299 width=0) (actual time=1162.482..1162.482 rows=2584111 loops=1)
                           Index Cond: (url_status = 1)
               SubPlan 1
                 ->  Gather  (cost=2143.66..404980.05 rows=301157 width=4) (actual time=36.819..1671.069 rows=265977 loops=1)
                       Workers Planned: 2
                       Workers Launched: 2
                       ->  Nested Loop  (cost=1143.66..373864.35 rows=125482 width=4) (actual time=137.208..1647.572 rows=88659 loops=3)
                             ->  Hash Join  (cost=1143.23..315793.12 rows=125482 width=4) (actual time=137.126..1189.980 rows=88950 loops=3)
                                   Hash Cond: (u1.campaign_id = u2.id)
                                   ->  Parallel Seq Scan on campaigns_processedurl u1  (cost=0.00..302007.65 rows=4815065 width=8) (actual time=0.022..635.291 rows=3857165 loops=3)
                                   ->  Hash  (cost=1136.76..1136.76 rows=518 width=4) (actual time=135.196..135.196 rows=518 loops=3)
                                         Buckets: 1024  Batches: 1  Memory Usage: 27kB
                                         ->  Bitmap Heap Scan on campaigns_urlcampaign u2  (cost=16.30..1136.76 rows=518 width=4) (actual time=133.979..135.121 rows=518 loops=3)
                                               Recheck Cond: (project_id = 999)
                                               Heap Blocks: exact=94
                                               ->  Bitmap Index Scan on campaigns_urlcampaign_b098ad43  (cost=0.00..16.17 rows=518 width=0) (actual time=0.100..0.100 rows=518 loops=3)
                                                     Index Cond: (project_id = 999)
                             ->  Index Only Scan using my_company_pkey on my_company u0_1  (cost=0.43..0.46 rows=1 width=4) (actual time=0.005..0.005 rows=1 loops=266849)
                                   Index Cond: (id = u1.company_id)
                                   Heap Fetches: 65356
         ->  Hash  (cost=8315.25..8315.25 rows=250555 width=4) (actual time=597.008..597.009 rows=277360 loops=1)
               Buckets: 524288 (originally 262144)  Batches: 1 (originally 1)  Memory Usage: 13847kB
               ->  Index Only Scan using my_companysource_source_id_a8900f16_uniq on my_companysource u0  (cost=0.44..8315.25 rows=250555 width=4) (actual time=417.555..543.621 rows=277360 loops=1)
                     Index Cond: (source_id = ANY ('{1,2,3,4,5,6}'::integer[]))
                     Heap Fetches: 208358
 Planning Time: 6.181 ms
 JIT:
   Functions: 71
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 6.576 ms, Inlining 153.968 ms, Optimization 404.078 ms, Emission 260.011 ms, Total 824.634 ms
 Execution Time: 93478.951 ms
(45 rows)

没有

ORDER BY执行计划

我必须省略全文查询,因为问题将超过字符限制。

代码语言:javascript
复制
EXPLAIN ANALYZE SELECT * FROM "my_company" WHERE ("my_company"."company_type" = 2 AND "my_company"."id" IN (SELECT U0."company_id" FROM "my_companysource" U0 WHERE U0."source_id" IN (1,2,3,4,5,6)) AND NOT ("my_company"."id" IN (SELECT U0."id" FROM "my_company" U0 INNER JOIN "campaigns_processedurl" U1 ON (U0."id" = U1."company_id") INNER JOIN "campaigns_urlcampaign" U2 ON (U1."campaign_id" = U2."id") WHERE U2."project_id" IN (999))) AND ( ("my_company"."fulltext_search_index") @@ (to_tsquery('pg_catalog.english',
*same query as in the execution plan above*))) AND "my_company"."url_status" IN (1));

QUERY PLAN
----------

 Hash Semi Join  (cost=512357.67..1237114.98 rows=12571 width=1599) (actual time=4974.659..9603.716 rows=10881 loops=1)
   Hash Cond: (my_company.id = u0.company_id)
   ->  Bitmap Heap Scan on my_company  (cost=500699.64..1224420.02 rows=192089 width=1599) (actual time=4366.382..8953.154 rows=106143 loops=1)
         Recheck Cond: ((fulltext_search_index @@ '
*same query as in the execution plan above*
'::tsquery) AND (company_type = 2) AND (url_status = 1))
         Rows Removed by Index Recheck: 54962
         Filter: (NOT (hashed SubPlan 1))
         Rows Removed by Filter: 57166
         Heap Blocks: exact=197126
         ->  BitmapAnd  (cost=94756.73..94756.73 rows=384177 width=0) (actual time=2651.329..2651.329 rows=0 loops=1)
               ->  Bitmap Index Scan on my_company_main_fulltext  (cost=0.00..7117.74 rows=604909 width=0) (actual time=2261.803..2261.803 rows=343137 loops=1)
                     Index Cond: (fulltext_search_index @@ '
*same query as in the execution plan above*
'::tsquery)
               ->  Bitmap Index Scan on my_company_url_status_2704ae13bae74399_uniq  (cost=0.00..87542.70 rows=2433369 width=0) (actual time=367.116..367.116 rows=2545163 loops=1)
                     Index Cond: (url_status = 1)
         SubPlan 1
           ->  Gather  (cost=2143.66..405190.13 rows=301113 width=4) (actual time=36.725..1599.373 rows=265970 loops=1)
                 Workers Planned: 2
                 Workers Launched: 2
                 ->  Nested Loop  (cost=1143.66..374078.83 rows=125464 width=4) (actual time=124.494..1575.190 rows=88657 loops=3)
                       ->  Hash Join  (cost=1143.23..315802.59 rows=125464 width=4) (actual time=124.411..1161.406 rows=88950 loops=3)
                             Hash Cond: (u1.campaign_id = u2.id)
                             ->  Parallel Seq Scan on campaigns_processedurl u1  (cost=0.00..302015.15 rows=4815815 width=8) (actual time=0.020..583.296 rows=3869911 loops=3)
                             ->  Hash  (cost=1136.76..1136.76 rows=518 width=4) (actual time=122.546..122.546 rows=518 loops=3)
                                   Buckets: 1024  Batches: 1  Memory Usage: 27kB
                                   ->  Bitmap Heap Scan on campaigns_urlcampaign u2  (cost=16.30..1136.76 rows=518 width=4) (actual time=121.306..122.471 rows=518 loops=3)
                                         Recheck Cond: (project_id = 999)
                                         Heap Blocks: exact=94
                                         ->  Bitmap Index Scan on campaigns_urlcampaign_b098ad43  (cost=0.00..16.17 rows=518 width=0) (actual time=0.101..0.101 rows=518 loops=3)
                                               Index Cond: (project_id = 999)
                       ->  Index Only Scan using my_company_pkey on my_company u0_1  (cost=0.43..0.46 rows=1 width=4) (actual time=0.004..0.004 rows=1 loops=266849)
                             Index Cond: (id = u1.company_id)
                             Heap Fetches: 42965
   ->  Hash  (cost=8523.76..8523.76 rows=250742 width=4) (actual time=606.830..606.830 rows=277120 loops=1)
         Buckets: 524288 (originally 262144)  Batches: 1 (originally 1)  Memory Usage: 13839kB
         ->  Index Only Scan using my_companysource_source_id_a8900f16_uniq on my_companysource u0  (cost=0.44..8523.76 rows=250742 width=4) (actual time=418.030..549.896 rows=277120 loops=1)
               Index Cond: (source_id = ANY ('{1,2,3,4,5,6}'::integer[]))
               Heap Fetches: 210515
 Planning Time: 6.179 ms
 JIT:
   Functions: 71
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 6.517 ms, Inlining 135.258 ms, Optimization 384.718 ms, Emission 260.459 ms, Total 786.952 ms
 Execution Time: 9658.889 ms
(42 rows)

索引定义:

代码语言:javascript
复制
CREATE INDEX my_company_main_fulltext 
ON public.my_company USING gin (fulltext_search_index) 
WHERE (company_type = 2)
EN

回答 1

Database Administration用户

发布于 2020-09-17 12:27:18

时间花在“位图堆扫描”上。

但是相同的位图堆扫描在一次执行中是快速的,而在另一次执行中是缓慢的,因此由于在第一次执行之后缓存,两者之间的差别非常小。

要确保使用EXPLAIN (ANALYZE, BUFFERS),这将显示缓存中命中的块数和从磁盘/内核缓存读取的块的数量。

票数 1
EN
页面原文内容由Database Administration提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://dba.stackexchange.com/questions/275366

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档