文章/答案/技术大牛

发布

社区首页 >问答首页 >“大于”或“小于”的PostgreSQL并行扫描

问“大于”或“小于”的PostgreSQL并行扫描
EN

Database Administration用户

提问于 2020-10-07 17:55:57

回答 1查看 208关注 0票数 0

我有两个问题。一个有"=“，另一个有">”作为过滤条件。我的问题是，为什么规划者计划只在条件为等于的情况下并行运行查询。

如果我处理parallel_tuple_cost和parallel_setup_cost的默认值，我可以使两者并行运行。但是我想知道为什么默认配置是这样计划的。

我正在提供我执行的查询和规划者解释输出。在fiter中使用的字段中没有索引。

这与两名工作人员并行进行：

VACUUM; EXPLAIN ANALYSE select screen_name from accounts where followers_count = 100;

(cost=1000.00..81857.06 rows=401 width=12) (actual time=0.442..149.981 rows=5028 loops=1)
  Workers Planned: 2
  Workers Launched: 2
  ->  Parallel Seq Scan on accounts  (cost=0.00..80816.96 rows=167 width=12) (actual time=0.100..113.974 rows=1676 loops=3)
        Filter: (followers_count = 100)
        Rows Removed by Filter: 1154464
Planning Time: 0.040 ms
Execution Time: 150.294 ms

这与单个员工一起运行：

VACUUM; EXPLAIN ANALYSE select screen_name from accounts where followers_count > 100;

Seq Scan on accounts  (cost=0.00..106105.64 rows=2278992 width=12) (actual time=2.608..562.870 rows=2286022 loops=1)
  Filter: (followers_count > 100)
  Rows Removed by Filter: 1182399
Planning Time: 0.059 ms
JIT:
  Functions: 4
   Options: Inlining false, Optimization false, Expressions true, Deforming true"
   Timing: Generation 0.369 ms, Inlining 0.000 ms, Optimization 0.186 ms, Emission 2.316 ms, Total 2.871 ms"
Execution Time: 640.952 ms

查询在筛选条件中只有不同的符号。

EDIT1:

使用不太保守的设置：

SET parallel_tuple_cost to 0;
SET parallel_setup_cost to 0;
SET max_parallel_workers_per_gather to 4;
VACUUM; EXPLAIN ANALYSE select * from accounts where followers_count > 100;
    
Gather  (cost=0.00..73589.74 rows=2278693 width=121) (actual time=0.220..391.700 rows=2286022 loops=1)
  Workers Planned: 4
  Workers Launched: 4
  ->  Parallel Seq Scan on accounts  (cost=0.00..73589.74 rows=569673 width=121) (actual time=0.024..117.873 rows=457204 loops=5)
        Filter: (followers_count > 100)
        Rows Removed by Filter: 236480
Planning Time: 0.065 ms
Execution Time: 484.223 ms

使用默认设置：

SET parallel_tuple_cost to 0.1;
SET parallel_setup_cost to 1000;
SET max_parallel_workers_per_gather to 2;
VACUUM; EXPLAIN ANALYSE select * from accounts where followers_count > 100;

Seq Scan on accounts  (cost=0.00..106100.83 rows=2278739 width=121) (actual time=2.610..552.259 rows=2286022 loops=1)
  Filter: (followers_count > 100)
  Rows Removed by Filter: 1182399
Planning Time: 0.083 ms
JIT:
  Functions: 2
"  Options: Inlining false, Optimization false, Expressions true, Deforming true"
"  Timing: Generation 0.462 ms, Inlining 0.000 ms, Optimization 0.175 ms, Emission 2.239 ms, Total 2.876 ms"
Execution Time: 634.788 ms

postgresql

execution-plan

回答 1

Database Administration用户

发布于 2020-10-07 21:20:25

您的>查询返回的行比=多得多。从并行工作者返回的每一行都需要从并行工作者传递给领导者，这样做是不免费的。(这是parallel_tuple_cost核算的成本)。另一方面，由并行工作人员过滤掉的行不受此费用的影响，因为它不会被放弃。如果大多数行将被过滤掉，则并行进行筛选是有意义的。如果有少数行将被过滤掉，那么并行地进行筛选可能没有意义，但随后在进程之间来回移动数据的额外工作就没有意义了。

您没有向我们展示的是，并行执行>查询是否比在单个进程中执行它要快。也许你需要把parallel_tuple_cost调小一些。默认设置相当保守。

票数 1

页面原文内容由Database Administration提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://dba.stackexchange.com/questions/276709

复制

相似问题

问“大于”或“小于”的PostgreSQL并行扫描
EN

EDIT1:

回答 1

Database Administration用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问“大于”或“小于”的PostgreSQL并行扫描EN

EDIT1:

回答 1

Database Administration用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问“大于”或“小于”的PostgreSQL并行扫描
EN