我有一个有数百万行的表,其中我在WHERE子句中执行一个带有复杂全文搜索查询的D1(在一个全文搜索索引列上),并在最后使用ORDER BY id DESC。查询中有更多的部分,但我认为这是相关的部分。
最后的输出是10k行。
查询非常慢(~1.5分钟),但是如果我删除全文搜索或ORDER BY,它将在几秒钟内解析。
为什么会这样呢?有什么方法可以提高性能吗?
完整查询的EXPLAIN ANALYZE输出如下:
EXPLAIN ANALYZE SELECT
* FROM "my_company" WHERE ("my_company"."company_type" = 2 AND "my_company"."id" IN (SELECT U0."company_id" FROM "my_companysource" U0 WHERE U0."source_id" IN (1,2,3,4,5,6)) AND NOT ("my_company"."id" IN (SELECT U0."id" FROM "my_company" U0 INNER JOIN "campaigns_processedurl" U1 ON (U0."id" = U1."company_id") INNER JOIN "campaigns_urlcampaign" U2 ON (U1."campaign_id" = U2."id") WHERE U2."project_id" IN (999))) AND ( ("my_company"."fulltext_search_index") @@ (to_tsquery('pg_catalog.english',
'(Trading)|(Bank)|(Banking)|(Finance)|(Financing)|(Transaction)|(Compliance)|(Insurance)|(Currencies)|(Currency)|(Credit <-> Cards)|(Financial <-> Services)|(Invoicing)|(Loans)|(Leasing)|(Cryptocurrency)|(Risk <-> Management)|(Stock <-> Exchanges)|(Asset <-> Management)|(Fintech)|(Incubators)|(Bitcoin)|(Lending)|(Point <-> Of <-> Sale)|(Cyber <-> Security)|(Trading <-> Platform)|(Smart <-> Contracts)|(Peer <-> To <-> Peer)|(Debit <-> Cards)|(Venture <-> Capital)|(Mortgages)|(P2P)|(Wealth <-> Management)|(Crowdfunding)|(Financial <-> Planning)|(Stock <-> Market)|(Life <-> Insurance)|(Mobile <-> Payments)|(Health <-> Insurance)|(Remittance)|(Real <-> Estate <-> Investment)|(Business <-> Financing)|(Investing <-> Services)|(Financial <-> Technology)|(Pos <-> Solution)|(Transaction <-> Processing)|(Private <-> Banking)|(Credit <-> Scoring)|(Hedge <-> Funds)|(Personal <-> Finance)|(Retail <-> Banking)|(Claim <-> Management)|(Fraud <-> Detection)|(Insurtech)|(Virtual <-> Currency)|(Angel <-> Investment)|(Crowd <-> Funding)|(Core <-> Banking)|(Digital <-> Bank)|(Fin <-> Tech)|(Consumer <-> Financing)|(Impact <-> Investing)|(P2P <-> Lending)|(Debt <-> Collections)|(Banking <-> Platform)|(Auto <-> Insurance)|(Property <-> Insurance)|(Commercial <-> Insurance)|(Prediction <-> Markets)|(Creditworthiness)|(Regtech)|(Financial <-> Exchanges)|(Consumer <-> Lending)|(Credit <-> Bureau)|(Banking <-> Api)|(Authentication <-> Technology)|(Micro <-> Lending)|(Microloans)|(Regulatory <-> Tech)|(Peer <-> To <-> Peer <-> Banking)|(Commercial <-> Lending)|(Financial <-> Management <-> Application)|(Microinvesting)|(P2P <-> Insurance)|(Reg <-> Tech)|(Bank <-> Clearance)|(Crowd <-> Financing)|(Crowdfinncing)|(Investment <-> Management <-> System)|(P2P <-> Banking)'
))) AND "my_company"."url_status" IN (1)) ORDER BY "my_company"."id" DESC;
QUERY PLAN
----------
Sort (cost=1246558.99..1246591.25 rows=12902 width=1600) (actual time=93420.927..93422.435 rows=10934 loops=1)
Sort Key: my_company.id DESC
Sort Method: quicksort Memory: 21040kB
-> Hash Semi Join (cost=511240.57..1245678.09 rows=12902 width=1600) (actual time=5997.812..93374.174 rows=10934 loops=1)
Hash Cond: (my_company.id = u0.company_id)
-> Bitmap Heap Scan on my_company (cost=499793.38..1233166.59 rows=197175 width=1600) (actual time=5397.255..92718.751 rows=106119 loops=1)
Recheck Cond: ((fulltext_search_index @@ '
''trade'' | ''bank'' | ''bank'' | ''financ'' | ''financ'' | ''transact'' | ''complianc'' | ''insur'' | ''currenc'' | ''currenc'' | ''credit'' <-> ''card'' | ''financi'' <-> ''servic'' | ''invoic'' | ''loan'' | ''leas'' | ''cryptocurr'' | ''risk'' <-> ''manag'' | ''stock'' <-> ''exchang'' | ''asset'' <-> ''manag'' | ''fintech'' | ''incub'' | ''bitcoin'' | ''lend'' | ''point'' <2> ''sale'' | ''cyber'' <-> ''secur'' | ''trade'' <-> ''platform'' | ''smart'' <-> ''contract'' | ''peer'' <2> ''peer'' | ''debit'' <-> ''card'' | ''ventur'' <-> ''capit'' | ''mortgag'' | ''p2p'' | ''wealth'' <-> ''manag'' | ''crowdfund'' | ''financi'' <-> ''plan'' | ''stock'' <-> ''market'' | ''life'' <-> ''insur'' | ''mobil'' <-> ''payment'' | ''health'' <-> ''insur'' | ''remitt'' | ''real'' <-> ''estat'' <-> ''invest'' | ''busi'' <-> ''financ'' | ''invest'' <-> ''servic'' | ''financi'' <-> ''technolog'' | ''pos'' <-> ''solut'' | ''transact'' <-> ''process'' | ''privat'' <-> ''bank'' | ''credit'' <-> ''score'' | ''hedg'' <-> ''fund'' | ''person'' <-> ''financ'' | ''retail'' <-> ''bank'' | ''claim'' <-> ''manag'' | ''fraud'' <-> ''detect'' | ''insurtech'' | ''virtual'' <-> ''currenc'' | ''angel'' <-> ''invest'' | ''crowd'' <-> ''fund'' | ''core'' <-> ''bank'' | ''digit'' <-> ''bank'' | ''fin'' <-> ''tech'' | ''consum'' <-> ''financ'' | ''impact'' <-> ''invest'' | ''p2p'' <-> ''lend'' | ''debt'' <-> ''collect'' | ''bank'' <-> ''platform'' | ''auto'' <-> ''insur'' | ''properti'' <-> ''insur'' | ''commerci'' <-> ''insur'' | ''predict'' <-> ''market'' | ''creditworthi'' | ''regtech'' | ''financi'' <-> ''exchang'' | ''consum'' <-> ''lend'' | ''credit'' <-> ''bureau'' | ''bank'' <-> ''api'' | ''authent'' <-> ''technolog'' | ''micro'' <-> ''lend'' | ''microloan'' | ''regulatori'' <-> ''tech'' | ''peer'' <2> ''peer'' <-> ''bank'' | ''commerci'' <-> ''lend'' | ''financi'' <-> ''manag'' <-> ''applic'' | ''microinvest'' | ''p2p'' <-> ''insur'' | ''reg'' <-> ''tech'' | ''bank'' <-> ''clearanc'' | ''crowd'' <-> ''financ'' | ''crowdfinnc'' | ''invest'' <-> ''manag'' <-> ''system'' | ''p2p'' <-> ''bank''
'::tsquery) AND (company_type = 2) AND (url_status = 1))
Rows Removed by Index Recheck: 54965
Filter: (NOT (hashed SubPlan 1))
Rows Removed by Filter: 57142
Heap Blocks: exact=197083
-> BitmapAnd (cost=94060.43..94060.43 rows=394349 width=0) (actual time=3605.515..3605.515 rows=0 loops=1)
-> Bitmap Index Scan on my_company_main_fulltext (cost=0.00..6414.92 rows=620827 width=0) (actual time=2419.704..2419.704 rows=352274 loops=1)
Index Cond: (fulltext_search_index @@ '
''trade'' | ''bank'' | ''bank'' | ''financ'' | ''financ'' | ''transact'' | ''complianc'' | ''insur'' | ''currenc'' | ''currenc'' | ''credit'' <-> ''card'' | ''financi'' <-> ''servic'' | ''invoic'' | ''loan'' | ''leas'' | ''cryptocurr'' | ''risk'' <-> ''manag'' | ''stock'' <-> ''exchang'' | ''asset'' <-> ''manag'' | ''fintech'' | ''incub'' | ''bitcoin'' | ''lend'' | ''point'' <2> ''sale'' | ''cyber'' <-> ''secur'' | ''trade'' <-> ''platform'' | ''smart'' <-> ''contract'' | ''peer'' <2> ''peer'' | ''debit'' <-> ''card'' | ''ventur'' <-> ''capit'' | ''mortgag'' | ''p2p'' | ''wealth'' <-> ''manag'' | ''crowdfund'' | ''financi'' <-> ''plan'' | ''stock'' <-> ''market'' | ''life'' <-> ''insur'' | ''mobil'' <-> ''payment'' | ''health'' <-> ''insur'' | ''remitt'' | ''real'' <-> ''estat'' <-> ''invest'' | ''busi'' <-> ''financ'' | ''invest'' <-> ''servic'' | ''financi'' <-> ''technolog'' | ''pos'' <-> ''solut'' | ''transact'' <-> ''process'' | ''privat'' <-> ''bank'' | ''credit'' <-> ''score'' | ''hedg'' <-> ''fund'' | ''person'' <-> ''financ'' | ''retail'' <-> ''bank'' | ''claim'' <-> ''manag'' | ''fraud'' <-> ''detect'' | ''insurtech'' | ''virtual'' <-> ''currenc'' | ''angel'' <-> ''invest'' | ''crowd'' <-> ''fund'' | ''core'' <-> ''bank'' | ''digit'' <-> ''bank'' | ''fin'' <-> ''tech'' | ''consum'' <-> ''financ'' | ''impact'' <-> ''invest'' | ''p2p'' <-> ''lend'' | ''debt'' <-> ''collect'' | ''bank'' <-> ''platform'' | ''auto'' <-> ''insur'' | ''properti'' <-> ''insur'' | ''commerci'' <-> ''insur'' | ''predict'' <-> ''market'' | ''creditworthi'' | ''regtech'' | ''financi'' <-> ''exchang'' | ''consum'' <-> ''lend'' | ''credit'' <-> ''bureau'' | ''bank'' <-> ''api'' | ''authent'' <-> ''technolog'' | ''micro'' <-> ''lend'' | ''microloan'' | ''regulatori'' <-> ''tech'' | ''peer'' <2> ''peer'' <-> ''bank'' | ''commerci'' <-> ''lend'' | ''financi'' <-> ''manag'' <-> ''applic'' | ''microinvest'' | ''p2p'' <-> ''insur'' | ''reg'' <-> ''tech'' | ''bank'' <-> ''clearanc'' | ''crowd'' <-> ''financ'' | ''crowdfinnc'' | ''invest'' <-> ''manag'' <-> ''system'' | ''p2p'' <-> ''bank''
'::tsquery)
-> Bitmap Index Scan on my_company_url_status_2704ae13bae74399_uniq (cost=0.00..87546.67 rows=2432299 width=0) (actual time=1162.482..1162.482 rows=2584111 loops=1)
Index Cond: (url_status = 1)
SubPlan 1
-> Gather (cost=2143.66..404980.05 rows=301157 width=4) (actual time=36.819..1671.069 rows=265977 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Nested Loop (cost=1143.66..373864.35 rows=125482 width=4) (actual time=137.208..1647.572 rows=88659 loops=3)
-> Hash Join (cost=1143.23..315793.12 rows=125482 width=4) (actual time=137.126..1189.980 rows=88950 loops=3)
Hash Cond: (u1.campaign_id = u2.id)
-> Parallel Seq Scan on campaigns_processedurl u1 (cost=0.00..302007.65 rows=4815065 width=8) (actual time=0.022..635.291 rows=3857165 loops=3)
-> Hash (cost=1136.76..1136.76 rows=518 width=4) (actual time=135.196..135.196 rows=518 loops=3)
Buckets: 1024 Batches: 1 Memory Usage: 27kB
-> Bitmap Heap Scan on campaigns_urlcampaign u2 (cost=16.30..1136.76 rows=518 width=4) (actual time=133.979..135.121 rows=518 loops=3)
Recheck Cond: (project_id = 999)
Heap Blocks: exact=94
-> Bitmap Index Scan on campaigns_urlcampaign_b098ad43 (cost=0.00..16.17 rows=518 width=0) (actual time=0.100..0.100 rows=518 loops=3)
Index Cond: (project_id = 999)
-> Index Only Scan using my_company_pkey on my_company u0_1 (cost=0.43..0.46 rows=1 width=4) (actual time=0.005..0.005 rows=1 loops=266849)
Index Cond: (id = u1.company_id)
Heap Fetches: 65356
-> Hash (cost=8315.25..8315.25 rows=250555 width=4) (actual time=597.008..597.009 rows=277360 loops=1)
Buckets: 524288 (originally 262144) Batches: 1 (originally 1) Memory Usage: 13847kB
-> Index Only Scan using my_companysource_source_id_a8900f16_uniq on my_companysource u0 (cost=0.44..8315.25 rows=250555 width=4) (actual time=417.555..543.621 rows=277360 loops=1)
Index Cond: (source_id = ANY ('{1,2,3,4,5,6}'::integer[]))
Heap Fetches: 208358
Planning Time: 6.181 ms
JIT:
Functions: 71
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 6.576 ms, Inlining 153.968 ms, Optimization 404.078 ms, Emission 260.011 ms, Total 824.634 ms
Execution Time: 93478.951 ms
(45 rows)没有
的ORDER BY执行计划
我必须省略全文查询,因为问题将超过字符限制。
EXPLAIN ANALYZE SELECT * FROM "my_company" WHERE ("my_company"."company_type" = 2 AND "my_company"."id" IN (SELECT U0."company_id" FROM "my_companysource" U0 WHERE U0."source_id" IN (1,2,3,4,5,6)) AND NOT ("my_company"."id" IN (SELECT U0."id" FROM "my_company" U0 INNER JOIN "campaigns_processedurl" U1 ON (U0."id" = U1."company_id") INNER JOIN "campaigns_urlcampaign" U2 ON (U1."campaign_id" = U2."id") WHERE U2."project_id" IN (999))) AND ( ("my_company"."fulltext_search_index") @@ (to_tsquery('pg_catalog.english',
*same query as in the execution plan above*))) AND "my_company"."url_status" IN (1));
QUERY PLAN
----------
Hash Semi Join (cost=512357.67..1237114.98 rows=12571 width=1599) (actual time=4974.659..9603.716 rows=10881 loops=1)
Hash Cond: (my_company.id = u0.company_id)
-> Bitmap Heap Scan on my_company (cost=500699.64..1224420.02 rows=192089 width=1599) (actual time=4366.382..8953.154 rows=106143 loops=1)
Recheck Cond: ((fulltext_search_index @@ '
*same query as in the execution plan above*
'::tsquery) AND (company_type = 2) AND (url_status = 1))
Rows Removed by Index Recheck: 54962
Filter: (NOT (hashed SubPlan 1))
Rows Removed by Filter: 57166
Heap Blocks: exact=197126
-> BitmapAnd (cost=94756.73..94756.73 rows=384177 width=0) (actual time=2651.329..2651.329 rows=0 loops=1)
-> Bitmap Index Scan on my_company_main_fulltext (cost=0.00..7117.74 rows=604909 width=0) (actual time=2261.803..2261.803 rows=343137 loops=1)
Index Cond: (fulltext_search_index @@ '
*same query as in the execution plan above*
'::tsquery)
-> Bitmap Index Scan on my_company_url_status_2704ae13bae74399_uniq (cost=0.00..87542.70 rows=2433369 width=0) (actual time=367.116..367.116 rows=2545163 loops=1)
Index Cond: (url_status = 1)
SubPlan 1
-> Gather (cost=2143.66..405190.13 rows=301113 width=4) (actual time=36.725..1599.373 rows=265970 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Nested Loop (cost=1143.66..374078.83 rows=125464 width=4) (actual time=124.494..1575.190 rows=88657 loops=3)
-> Hash Join (cost=1143.23..315802.59 rows=125464 width=4) (actual time=124.411..1161.406 rows=88950 loops=3)
Hash Cond: (u1.campaign_id = u2.id)
-> Parallel Seq Scan on campaigns_processedurl u1 (cost=0.00..302015.15 rows=4815815 width=8) (actual time=0.020..583.296 rows=3869911 loops=3)
-> Hash (cost=1136.76..1136.76 rows=518 width=4) (actual time=122.546..122.546 rows=518 loops=3)
Buckets: 1024 Batches: 1 Memory Usage: 27kB
-> Bitmap Heap Scan on campaigns_urlcampaign u2 (cost=16.30..1136.76 rows=518 width=4) (actual time=121.306..122.471 rows=518 loops=3)
Recheck Cond: (project_id = 999)
Heap Blocks: exact=94
-> Bitmap Index Scan on campaigns_urlcampaign_b098ad43 (cost=0.00..16.17 rows=518 width=0) (actual time=0.101..0.101 rows=518 loops=3)
Index Cond: (project_id = 999)
-> Index Only Scan using my_company_pkey on my_company u0_1 (cost=0.43..0.46 rows=1 width=4) (actual time=0.004..0.004 rows=1 loops=266849)
Index Cond: (id = u1.company_id)
Heap Fetches: 42965
-> Hash (cost=8523.76..8523.76 rows=250742 width=4) (actual time=606.830..606.830 rows=277120 loops=1)
Buckets: 524288 (originally 262144) Batches: 1 (originally 1) Memory Usage: 13839kB
-> Index Only Scan using my_companysource_source_id_a8900f16_uniq on my_companysource u0 (cost=0.44..8523.76 rows=250742 width=4) (actual time=418.030..549.896 rows=277120 loops=1)
Index Cond: (source_id = ANY ('{1,2,3,4,5,6}'::integer[]))
Heap Fetches: 210515
Planning Time: 6.179 ms
JIT:
Functions: 71
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 6.517 ms, Inlining 135.258 ms, Optimization 384.718 ms, Emission 260.459 ms, Total 786.952 ms
Execution Time: 9658.889 ms
(42 rows)索引定义:
CREATE INDEX my_company_main_fulltext
ON public.my_company USING gin (fulltext_search_index)
WHERE (company_type = 2)发布于 2020-09-17 12:27:18
时间花在“位图堆扫描”上。
但是相同的位图堆扫描在一次执行中是快速的,而在另一次执行中是缓慢的,因此由于在第一次执行之后缓存,两者之间的差别非常小。
要确保使用EXPLAIN (ANALYZE, BUFFERS),这将显示缓存中命中的块数和从磁盘/内核缓存读取的块的数量。
https://dba.stackexchange.com/questions/275366
复制相似问题