首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >使用join,tips?从大表(100M+)中提取记录时查询速度很慢?

使用join,tips?从大表(100M+)中提取记录时查询速度很慢?
EN

Stack Overflow用户
提问于 2020-11-03 00:46:15
回答 1查看 42关注 0票数 0

在一些大表上改进下面的查询(我正在使用Postgres v12.4):

代码语言:javascript
复制
people : 137 Million records
delimiters: 1.2 Million records
person_delimiters: 329 Million records
代码语言:javascript
复制
SELECT "delimiters".*, "a"."person_id" FROM "delimiters" 
INNER JOIN "person_delimiters" as "a" on "delimiters"."id" = "a"."delimiter_id" 
WHERE ("a"."person_id" IN (SELECT id FROM "people" LIMIT 1000));

(这里使用LIMIT 1000的子查询是出于一般目的,但在它的实际应用程序中,我得到了1000个人ids的特定集合)

person_delimiters是一个包含两列(person_iddelimiter_id)的中间表;

EXPLAIN ANALYZE的输出:

代码语言:javascript
复制
 Hash Join  (cost=46025.12..11207871.13 rows=164750752 width=2346) (actual time=65659.597..66354.997 rows=1044 loops=1)
   Hash Cond: (a.delimiter_id = delimiters.id)
   ->  Hash Semi Join  (cost=57.96..8456389.56 rows=164750752 width=32) (actual time=20.582..64777.963 rows=1044 loops=1)
         Hash Cond: (a.person_id = people.id)
         ->  Seq Scan on person_delimiters a  (cost=0.00..5758538.04 rows=329501504 width=32) (actual time=0.008..30865.854 rows=329501518 loops=1)
         ->  Hash  (cost=45.46..45.46 rows=1000 width=16) (actual time=0.384..0.385 rows=1000 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 55kB
               ->  Limit  (cost=0.00..35.46 rows=1000 width=16) (actual time=0.004..0.241 rows=1000 loops=1)
                     ->  Seq Scan on people  (cost=0.00..4888821.40 rows=137873840 width=16) (actual time=0.003..0.158 rows=1000 loops=1)
   ->  Hash  (cost=24508.85..24508.85 rows=71385 width=2330) (actual time=839.841..839.841 rows=1227076 loops=1)
         Buckets: 2048  Batches: 64  Memory Usage: 3015kB
         ->  Seq Scan on delimiters  (cost=0.00..24508.85 rows=71385 width=2330) (actual time=0.007..303.814 rows=1227076 loops=1)
 Planning Time: 1.197 ms
 Execution Time: 66355.110 ms

索引:

代码语言:javascript
复制
-- person_delimiters --
 public     | person_delimiters | person_delimiters_pkey                       |            | CREATE UNIQUE INDEX person_delimiters_pkey ON public.person_delimiters USING btree (
person_id, delimiter_id)
 public     | person_delimiters | idx_person_delimiters_delimiter_id_person_id |            | CREATE INDEX idx_person_delimiters_delimiter_id_person_id ON public.person_delimiter
s USING btree (delimiter_id, person_id)

-- people -- 
 public     | people    | people_two_pkey                            |            | CREATE UNIQUE INDEX people_pkey ON public.people USING btree (id)

-- delimiters -- 
 public     | delimiters | delimiters_pkey                            |            | CREATE UNIQUE INDEX delimiters_pkey ON public.delimiters USING btree (id)

有什么我可以做的,来优化它吗?

EN

回答 1

Stack Overflow用户

发布于 2020-11-03 18:48:00

通常情况下,最好从最小的表/数据集开始,然后从那里开始。下面可能是处理SQL问题的最有效的方法(第二个CTE实际上是用来阐明方法的--您可能会将逻辑移到主select语句中,而不会有太多影响)。

如果这不能显着提高性能,那么您可能正在考虑索引/分区/临时表解决方案。

代码语言:javascript
复制
with person_id_list as 
(
    SELECT id FROM people LIMIT 1000
),
person_delim_list as 
(
    SELECT pd.delimiter_id, pd.person_id
    FROM person_id_list pil
    INNER JOIN person_delimiters pd on pil.id = pd.person_id
)
SELECT D.*, pdl.person_id
FROM person_delim_list pdl
INNER JOIN delimiters d on pdl.delimiter_id = d.id
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/64649715

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档