我在物化视图中使用这个查询找到了一个API,它生成资产库存的聚合报告(PostgreSQL 13.3),并且在120万行的表上非常慢
SELECT COALESCE (SUM(CASE
WHEN (t1.mech_id = 222 OR t1.workshop_representation_id = 222)
AND t1.is_incoming = true
THEN 1
ELSE 0
END), 0) as point1,
COALESCE(SUM(CASE
WHEN (t1.mech_id = 222 OR t1.workshop_representation_id = 222)
AND t1.is_incoming = false
AND t1.train_id NOT IN (
SELECT distinct t1.train_id
FROM mvw_engine_details x1
WHERE x1.inspection_date BETWEEN '2022-03-01' AND '2022-06-05'
AND (x1.mech_id = 222 OR x1.workshop_representation_id = 222) AND x1.is_incoming = true
)
THEN 1
ELSE 0
END), 0) as point2,
COALESCE(SUM(CASE
WHEN (t1.mech_id = 222 OR t1.workshop_representation_id = 222)
AND t1.is_incoming = true
AND t1.express_train = 'true'
THEN 1
ELSE 0
END), 0) as point1express,
COALESCE(SUM(CASE
WHEN (t1.mech_id = 222 OR t1.workshop_representation_id = 222)
AND t1.is_incoming = false
AND t1.train_id NOT IN (
SELECT distinct t1.train_id
FROM mvw_engine_details x1
WHERE x1.inspection_date BETWEEN '2022-03-01' AND '2022-06-05'
AND (x1.mech_id = 222 OR x1.workshop_representation_id = 222) AND x1.is_incoming = true
)
AND t1.express_train = 'true'
THEN 1
ELSE 0
END), 0) as point2express
FROM mvw_engine_details t1
WHERE t1.inspection_date BETWEEN '2022-03-01' AND '2022-06-05'
在创建了一些索引之后,下面是当前的explain analyse统计数据:
计划时间: 0.323 ms 执行时间: 6.380 ms
CREATE INDEX IF NOT EXISTS mvw_transport_train_id_inspection_date
ON datamart.mvw_engine_details(inspection_date, train_id);
CREATE INDEX IF NOT EXISTS mvw_transport_inspection_date_idx
ON datamart.mvw_engine_details (inspection_date)
WITH (deduplicate_items = off);
CREATE INDEX IF NOT EXISTS mvw_transport_train_id_idx
ON datamart.mvw_engine_details (train_id)
WITH (deduplicate_items = off);有什么方法来改进查询吗?还是它的结构很差?
发布于 2022-06-07 16:40:45
你的查询没那么慢。但是在清理之后,它应该会更快一点,但是:
WITH cte AS ( -- apply common basic filters *once*
SELECT is_incoming, express_train, train_id
FROM mvw_engine_details
WHERE inspection_date BETWEEN '2022-03-01' AND '2022-06-05'
AND 222 IN (mech_id, workshop_representation_id)
)
SELECT count(*) FILTER (WHERE is_incoming) AS point1
, count(*) FILTER (WHERE NOT is_incoming
AND no_other_incoming_train) AS point2
, count(*) FILTER (WHERE is_incoming AND express_train) AS point1express
, count(*) FILTER (WHERE NOT is_incoming AND express_train
AND no_other_incoming_train) AS point2express
FROM (
SELECT is_incoming, express_train
, NOT EXISTS (
SELECT FROM cte c2
WHERE c2.train_id = c1.train_id
AND c2.is_incoming
) AS no_other_incoming_train
FROM cte c1
) sub;使用CTE,我们不必重复说明(并应用)您的公共筛选标准。
关于聚合FILTER子句:
与sum()不同,count()从不返回NULL。我们不需要添加COALESCE。请参见:
在基本查询中有一个“丑陋或”。请参见:
根据数据分布和基数,用UNION重写可能会有所帮助:
SELECT is_incoming, express_train, train_id
FROM mvw_engine_details
WHERE mech_id = 222
AND inspection_date BETWEEN '2022-03-01' AND '2022-06-05'
UNION
SELECT is_incoming, express_train, train_id
FROM mvw_engine_details
WHERE workshop_representation_id = 222
AND inspection_date BETWEEN '2022-03-01' AND '2022-06-05'除非您的时间范围是非常有选择性的,否则您所公开的任何索引都不会对此特别有用。经验法则是“平等第一,范围后”。请参见:
理想情况下,您有关于(mech_id, inspection_date)和(workshop_representation_id, inspection_date)的索引。
如果查询的两个分支的集合是相互排斥的,那么UNION ALL就更快了。
https://dba.stackexchange.com/questions/313022
复制相似问题