首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >PostgreSQL地理空间查询速度慢

PostgreSQL地理空间查询速度慢
EN

Database Administration用户
提问于 2018-08-13 13:07:15
回答 1查看 719关注 0票数 3

在放弃了MySQL之后,我尝试了Elasticsearch,现在不想看我是否可以使用PostgreSQL/PostGIS,它只允许我使用PostgreSQL。

我需要按距离从表中提取记录(不一定准确),并按距离进行排序。这张表有1000万份记录。

当我查询PostgreSQL的速度慢于MySQL时,我想我一定做错了什么。

我能做得更好吗?

表:

代码语言:javascript
复制
id | hash_id | town | geo_pt2 

geo_pt2 is geography

索引:

代码语言:javascript
复制
CREATE INDEX geo_pt2_gix ON public.member_profile USING gist (geo_pt2)

查询:

代码语言:javascript
复制
SELECT hash_id, town
     , ST_Distance(t.x, geo_pt2) AS dist
FROM   member_profile, (SELECT ST_GeographyFromText('POINT(47.4667 8.3167)')) AS t(x)
WHERE  ST_DWithin(t.x, geo_pt2, 250000)
ORDER  BY dist
limit 100 offset 1000;

解释:

代码语言:javascript
复制
Limit  (cost=9.08..9.08 rows=1 width=53)
  ->  Sort  (cost=9.07..9.08 rows=1 width=53)
        Sort Key: (_st_distance('0101000020E610000088855AD3BCBB474052499D8026A22040'::geography, member_profile.geo_pt2, '0'::double precision, true))
        ->  Index Scan using geo_pt2_gix on member_profile  (cost=0.42..9.06 rows=1 width=53)
              Index Cond: (geo_pt2 && '0101000020E610000088855AD3BCBB474052499D8026A22040'::geography)
              Filter: (('0101000020E610000088855AD3BCBB474052499D8026A22040'::geography && _st_expand(geo_pt2, '250000'::double precision)) AND _st_dwithin('0101000020E610000088855AD3BCBB474052499D8026A22040'::geography, geo_pt2, '250000'::double precision, true))

我在现代服务器上使用PostgreSQL 10,它有很高的IOPS (NVMe),查询需要35秒。

在推荐了@Evan更好的表现之后:

代码语言:javascript
复制
EXPLAIN ANALYZE SELECT hash_id, town
     , ST_Distance(ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2) AS dist
FROM   member_profile
WHERE  ST_DWithin(ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2, 250000)
ORDER  BY ST_MakePoint(47.4667, 8.3167)::geography <-> geo_pt2
OFFSET 10000
FETCH NEXT 100 ROWS ONLY;

Limit  (cost=9.31..18.21 rows=1 width=61) (actual time=392.608..394.138 rows=100 loops=1)
  ->  Index Scan using geo_pt2_gix on member_profile  (cost=0.42..9.31 rows=1 width=61) (actual time=26.624..392.776 rows=10100 loops=1)
        Index Cond: (geo_pt2 && '0101000020E610000088855AD3BCBB474052499D8026A22040'::geography)
        Order By: (geo_pt2 <-> '0101000020E610000088855AD3BCBB474052499D8026A22040'::geography)
        Filter: (('0101000020E610000088855AD3BCBB474052499D8026A22040'::geography && _st_expand(geo_pt2, '250000'::double precision)) AND _st_dwithin('0101000020E610000088855AD3BCBB474052499D8026A22040'::geography, geo_pt2, '250000'::double precision, true))
Planning time: 89.020 ms
Execution time: 395.039 ms

如果用户分页以结束它的速度变慢了:

代码语言:javascript
复制
EXPLAIN ANALYZE SELECT hash_id, town
     , ST_Distance(ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2) AS dist
FROM   member_profile
WHERE  ST_DWithin(ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2, 250000)
ORDER  BY ST_MakePoint(47.4667, 8.3167)::geography <-> geo_pt2
OFFSET 1000000
FETCH NEXT 100 ROWS ONLY;

Limit  (cost=9.31..18.21 rows=1 width=61) (actual time=28872.156..28873.239 rows=100 loops=1)

->  Index Scan using geo_pt2_gix on member_profile  (cost=0.42..9.31 rows=1 width=61) (actual time=32.441..28764.569 rows=1000100 loops=1)
    Index Cond: (geo_pt2 && '0101000020E610000088855AD3BCBB474052499D8026A22040'::geography)
    Order By: (geo_pt2 <-> '0101000020E610000088855AD3BCBB474052499D8026A22040'::geography)
    Filter: (('0101000020E610000088855AD3BCBB474052499D8026A22040'::geography && _st_expand(geo_pt2, '250000'::double precision)) AND _st_dwithin('0101000020E610000088855AD3BCBB474052499D8026A22040'::geography, geo_pt2, '250000'::double precision, true))
Planning time: 50.979 ms
Execution time: 28875.403 ms
EN

回答 1

Database Administration用户

回答已采纳

发布于 2018-08-13 14:04:01

首先,使用EXPLAIN ANALYZE (不仅仅是EXPLAIN)并在表上显示\d的结果。(psql)。作为第一点,

代码语言:javascript
复制
ST_GeographyFromText('POINT(47.4667 8.3167)')

应该写成ST_MakePoint(47.4667, 8.3167)::geography

你的问题是这个模式,

代码语言:javascript
复制
SELECT ST_Distance( ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2) AS dist
...
ORDER  BY dist
LIMIT 100 OFFSET 1000;

每次这样做,至少必须计算到1100行的距离。尽管如此,这不应该太慢。这很慢,因为为了做到这一点,您必须在所有行上计算ST_Distance。我们可以停止使用KNN,在那里使用<->算子。MySQL不支持KNN

代码语言:javascript
复制
SELECT hash_id, town
     , ST_Distance(ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2) AS dist
FROM   member_profile
WHERE  ST_DWithin(ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2, 250000)
ORDER  BY ST_MakePoint(47.4667, 8.3167)::geography <-> geo_pt2
OFFSET 1000
FETCH NEXT 100 ROWS ONLY;

作为一种风格批评,我个人更喜欢偏移/取回 (标准化方法限制/偏移)。

分页

我不确定这能起作用。但是,这可能是值得一试的(随时更新)。

代码语言:javascript
复制
SELECT hash_id, town
     , ST_Distance(ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2) AS dist
     , ST_MakePoint(47.4667, 8.3167)::geography <-> geo_pt2 AS myknn
FROM member_profile
WHERE ST_DWithin(ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2, 250000)
  AND ST_MakePoint(47.4667, 8.3167)::geography <-> geo_pt2 > OLD_VALUE
ORDER BY ST_MakePoint(47.4667, 8.3167)::geography <-> geo_pt2
FETCH NEXT 100 ROWS ONLY;

所以当您第一次运行这个时,您保存myknn的最后一个值,然后第二次运行这个值,您可以在这个子句中以OLD_VALUE的形式重放这个值,

代码语言:javascript
复制
AND ST_MakePoint(47.4667, 8.3167)::geography <-> geo_pt2 > OLD_VALUE

因此,每次运行它时,您都要保存要继续运行的新点,并使用FETCH NEXT x ROWS ONLY

myknndist对您来说可能是一样的,如果是的话,您只需删除其中的一个。

票数 5
EN
页面原文内容由Database Administration提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://dba.stackexchange.com/questions/214782

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档