文章/答案/技术大牛

发布

社区首页 >问答首页 >Postgresql在应用索引时执行seq_scan。为什么？

问Postgresql在应用索引时执行seq_scan。为什么？
EN

Stack Overflow用户

提问于 2015-03-30 12:58:24

回答 1查看 351关注 0票数 1

我在varchar(24)主键上有一个带有联接的查询。这是一个关键原因，是遗产，是变革的目标。然而，postgresql查询计划员坚持执行顺序扫描，这在我看来是不合理的。我用"SET enable_seqscan = off“将查询速度提高了8倍来支持”不合理“的说法。

我运行过“真空分析”；我玩过统计设置，但到目前为止没有运气。

查询是

select inventry.id, inventry.count, sum(invenwh.count) 
from invenwh join inventry on inventry.id=invenwh.id
where inventry.product_c='CAT17' 
group by 1, 2;

下面设置运行此查询的数据库。

drop table if exists inventry;
drop table if exists inwh;
drop table if exists invenwh;
drop table if exists inprodcategory;

-- Create 50 product categories.
create table inprodcategory as 
select i as id, concat('CAT', lpad(i::text, 2, '0'))::varchar(10) as category
from generate_series(1, 50, 1) as i;

-- Create 245,000 inventory items
create table inventry as 
select 
    concat('ITEM', lpad(i::text, 6, '0'))::varchar(24) as id, 
    concat('Item #', i::text)::varchar(50) as descr_1,
    c.category as product_c,
    (case when random() < 0.05 then (random()*70)::int else 0::int end) as count
from generate_series(1, 245000, 1) as i
    join inprodcategory as c on c.id=(i%50)::int;

-- Create 70 warehouses
create table inwh as 
select concat('WAREHOUSE', lpad(i::text, 2, '0'))::varchar(10) as warehouse
from generate_series(1, 70, 1) as i;

-- Create (ugly) cross-join table with counts/warehouse
create table invenwh as 
select id, warehouse, 
    (case when random() < 0.05 then (random()*10)::int else 0::int end) as count
from inventry, inwh;

create index on invenwh (id);
create index on inventry (id);

运行上述操作后，可以运行查询。在我使用SSD、i7和16 i7内存的硬件上，它需要4秒，但如果运行"set enable_seqscan=off"，则需要大约500 my。

编辑:添加解释(分析、缓冲区)

HashAggregate  (cost=449773.25..449822.25 rows=4900 width=19) (actual time=4180.006..4181.092 rows=4900 loops=1)
  Group Key: inventry.id, inventry.count
  Buffers: shared hit=4526 read=121051
  ->  Hash Join  (cost=5058.50..447200.75 rows=343000 width=19) (actual time=1285.800..4086.398 rows=343000 loops=1)
        Hash Cond: ((invenwh.id)::text = (inventry.id)::text)
        Buffers: shared hit=4526 read=121051
        ->  Seq Scan on invenwh  (cost=0.00..291651.00 rows=16807000 width=15) (actual time=0.077..1949.843 rows=16807000 loops=1)
              Buffers: shared hit=2530 read=121051
        ->  Hash  (cost=4997.25..4997.25 rows=4900 width=15) (actual time=48.897..48.897 rows=4900 loops=1)
              Buckets: 1024  Batches: 1  Memory Usage: 230kB
              Buffers: shared hit=1996
              ->  Seq Scan on inventry  (cost=0.00..4997.25 rows=4900 width=15) (actual time=21.903..47.031 rows=4900 loops=1)
                    Filter: ((product_c)::text = 'CAT17'::text)
                    Rows Removed by Filter: 235200
                    Buffers: shared hit=1996
Planning time: 4.266 ms
Execution time: 4181.395 ms

编辑:特定的后续问题

感谢@a_horse_with_no_name (非常感谢！！)降低random_page_cost似乎是要做的事情。这似乎或多或少地与服务器一致。

问:我是否可以运行任何基准来发现random_page_cost的最优值？在生产中，我在SCSI磁盘上(LSI MR9260-8i)。

问:我觉得统计数据在这里也可能是相关的，但我在互联网上的pg-stats类型页面上空空如也。对学习统计数据有什么提示吗？

postgresql

回答 1

Stack Overflow用户

回答已采纳

发布于 2015-03-30 14:00:46

当计划者估计的成本与执行时间的实际情况不匹配时，应该调整成本设置以更好地匹配您的硬件。

各种旋钮都记录在计划成本常数上。

特别是，在random_page_cost上有一些与您的情况相关的建议：

相对于顺序存储(例如固态驱动器)，具有较低随机读取成本的存储也可能更好地用于random_page_cost的较低值建模。

有关此参数的5种不同存储类型的更多调优建议，请参见随机页面成本修正。

TL;DR:对于SSD，请尝试第一个1.5 for random_page_cost。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/29346881

复制

相似问题

问Postgresql在应用索引时执行seq_scan。为什么？
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Postgresql在应用索引时执行seq_scan。为什么？EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Postgresql在应用索引时执行seq_scan。为什么？
EN