在一些大表上改进下面的查询(我正在使用Postgres v12.4):
people : 137 Million records
delimiters: 1.2 Million records
person_delimiters: 329 Million recordsSELECT "delimiters".*, "a"."person_id" FROM "delimiters"
INNER JOIN "person_delimiters" as "a" on "delimiters"."id" = "a"."delimiter_id"
WHERE ("a"."person_id" IN (SELECT id FROM "people" LIMIT 1000));(这里使用LIMIT 1000的子查询是出于一般目的,但在它的实际应用程序中,我得到了1000个人ids的特定集合)
person_delimiters是一个包含两列(person_id,delimiter_id)的中间表;
EXPLAIN ANALYZE的输出:
Hash Join (cost=46025.12..11207871.13 rows=164750752 width=2346) (actual time=65659.597..66354.997 rows=1044 loops=1)
Hash Cond: (a.delimiter_id = delimiters.id)
-> Hash Semi Join (cost=57.96..8456389.56 rows=164750752 width=32) (actual time=20.582..64777.963 rows=1044 loops=1)
Hash Cond: (a.person_id = people.id)
-> Seq Scan on person_delimiters a (cost=0.00..5758538.04 rows=329501504 width=32) (actual time=0.008..30865.854 rows=329501518 loops=1)
-> Hash (cost=45.46..45.46 rows=1000 width=16) (actual time=0.384..0.385 rows=1000 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 55kB
-> Limit (cost=0.00..35.46 rows=1000 width=16) (actual time=0.004..0.241 rows=1000 loops=1)
-> Seq Scan on people (cost=0.00..4888821.40 rows=137873840 width=16) (actual time=0.003..0.158 rows=1000 loops=1)
-> Hash (cost=24508.85..24508.85 rows=71385 width=2330) (actual time=839.841..839.841 rows=1227076 loops=1)
Buckets: 2048 Batches: 64 Memory Usage: 3015kB
-> Seq Scan on delimiters (cost=0.00..24508.85 rows=71385 width=2330) (actual time=0.007..303.814 rows=1227076 loops=1)
Planning Time: 1.197 ms
Execution Time: 66355.110 ms索引:
-- person_delimiters --
public | person_delimiters | person_delimiters_pkey | | CREATE UNIQUE INDEX person_delimiters_pkey ON public.person_delimiters USING btree (
person_id, delimiter_id)
public | person_delimiters | idx_person_delimiters_delimiter_id_person_id | | CREATE INDEX idx_person_delimiters_delimiter_id_person_id ON public.person_delimiter
s USING btree (delimiter_id, person_id)
-- people --
public | people | people_two_pkey | | CREATE UNIQUE INDEX people_pkey ON public.people USING btree (id)
-- delimiters --
public | delimiters | delimiters_pkey | | CREATE UNIQUE INDEX delimiters_pkey ON public.delimiters USING btree (id)有什么我可以做的,来优化它吗?
发布于 2020-11-03 18:48:00
通常情况下,最好从最小的表/数据集开始,然后从那里开始。下面可能是处理SQL问题的最有效的方法(第二个CTE实际上是用来阐明方法的--您可能会将逻辑移到主select语句中,而不会有太多影响)。
如果这不能显着提高性能,那么您可能正在考虑索引/分区/临时表解决方案。
with person_id_list as
(
SELECT id FROM people LIMIT 1000
),
person_delim_list as
(
SELECT pd.delimiter_id, pd.person_id
FROM person_id_list pil
INNER JOIN person_delimiters pd on pil.id = pd.person_id
)
SELECT D.*, pdl.person_id
FROM person_delim_list pdl
INNER JOIN delimiters d on pdl.delimiter_id = d.idhttps://stackoverflow.com/questions/64649715
复制相似问题