我在查询中使用"sub select“查询时遇到了问题:
select
f.timestamp::date as date,
user_id,
activity_type,
f.container_id as group_id,
(
select
string_agg(distinct("userId"), ',') as group_owners
from
jusers_groups_copy g
where
g.place_id = f.container_id
and state like 'owner'
) as group_owners
from
fact_activity f
where
f.container_type like '700'
and f.timestamp::date < to_date('2016-09-05', 'YYYY-MM-DD')
group by
date, user_id, activity_type, group_id
order by
date, user_id, activity_type, group_id实际上,里面的string_add大约需要20秒才能呈现出来。我用pgAdmin解释了这个查询,他给了我这样的消息:
"Group (cost=7029.62..651968.20 rows=17843 width=27) (actual time=431.017..4513.973 rows=11483 loops=1)"
" Buffers: shared hit=139498 read=411, temp read=255 written=255"
" -> Sort (cost=7029.62..7074.90 rows=18111 width=27) (actual time=430.630..667.098 rows=54660 loops=1)"
" Sort Key: ((f."timestamp")::date), f.user_id, f.activity_type, f.container_id"
" Sort Method: external merge Disk: 2008kB"
" Buffers: shared hit=1702 read=411, temp read=255 written=255"
" -> Seq Scan on fact_activity f (cost=0.00..5748.76 rows=18111 width=27) (actual time=0.107..188.827 rows=54660 loops=1)"
" Filter: ((container_type ~~ '700'::text) AND (("timestamp")::date < to_date('2016-09-05'::text, 'YYYY-MM-DD'::text)))"
" Rows Removed by Filter: 125414"
" Buffers: shared hit=1691 read=411"
" SubPlan 1"
" -> Aggregate (cost=36.12..36.13 rows=1 width=5) (actual time=0.315..0.318 rows=1 loops=11483)"
" Buffers: shared hit=137796"
" -> Seq Scan on users_groups_copy g (cost=0.00..36.09 rows=11 width=5) (actual time=0.041..0.266 rows=13 loops=11483)"
" Filter: ((state ~~ 'owner'::text) AND (place_id = f.container_id))"
" Rows Removed by Filter: 1593"
" Buffers: shared hit=137796"
"Total runtime: 4536.074 ms"此外,我尝试连接这些表,但请求的速度要慢得多,如下所示:
select
f.timestamp::date as date,
user_id,
activity_type,
f.container_id as group_id,
string_agg(distinct("userId"), ',') as group_owners
from
fact_activity f
join jusers_groups_copy g
on g.place_id = f.container_id
where
f.container_type like '700'
and f.timestamp::date < to_date('2016-09-05', 'YYYY-MM-DD')
and g.state like 'owner'
group by
date, user_id, activity_type, group_id
order by
date, user_id, activity_type, group_id最后,这个数据库中有任何索引,这是请求如此缓慢的原因吗?
我想知道如何改进这个请求。
提前感谢
发布于 2016-09-06 15:37:57
在不更改查询的情况下,最大的性能改进是在subselect中的表上建立索引,从而加速subselect:
CREATE INDEX nice_name ON jusers_groups_copy(place_id, state text_pattern_ops);但我会将查询重写为连接。这样,您可能会得到比嵌套循环更有效的东西,这取决于您的数据。
而不是
SELECT f.somecol,
(SELECT g.othercol
FROM jusers_groups_copy g
WHERE g.place_id = f.container_id
AND g.state LIKE 'owner')
FROM fact_activity f
WHERE ...;你应该写下
SELECT f.somecol, g.othercol
FROM fact_activity f
JOIN jusers_groups_copy g
ON g.place_id = f.container_id
WHERE g.state LIKE 'owner'
AND ...;根据所选的联接类型,上面的索引(用于嵌套循环)或不同的索引可以使查询速度更快。
发布于 2016-09-06 17:17:54
我想您需要使用以下网站更改/data/postgresql.conf中的一些配置
pgtune
我认为最重要的参数是"work_mem“
https://stackoverflow.com/questions/39333172
复制相似问题