在放弃了MySQL之后,我尝试了Elasticsearch,现在不想看我是否可以使用PostgreSQL/PostGIS,它只允许我使用PostgreSQL。
我需要按距离从表中提取记录(不一定准确),并按距离进行排序。这张表有1000万份记录。
当我查询PostgreSQL的速度慢于MySQL时,我想我一定做错了什么。
我能做得更好吗?
表:
id | hash_id | town | geo_pt2
geo_pt2 is geography索引:
CREATE INDEX geo_pt2_gix ON public.member_profile USING gist (geo_pt2)查询:
SELECT hash_id, town
, ST_Distance(t.x, geo_pt2) AS dist
FROM member_profile, (SELECT ST_GeographyFromText('POINT(47.4667 8.3167)')) AS t(x)
WHERE ST_DWithin(t.x, geo_pt2, 250000)
ORDER BY dist
limit 100 offset 1000;解释:
Limit (cost=9.08..9.08 rows=1 width=53)
-> Sort (cost=9.07..9.08 rows=1 width=53)
Sort Key: (_st_distance('0101000020E610000088855AD3BCBB474052499D8026A22040'::geography, member_profile.geo_pt2, '0'::double precision, true))
-> Index Scan using geo_pt2_gix on member_profile (cost=0.42..9.06 rows=1 width=53)
Index Cond: (geo_pt2 && '0101000020E610000088855AD3BCBB474052499D8026A22040'::geography)
Filter: (('0101000020E610000088855AD3BCBB474052499D8026A22040'::geography && _st_expand(geo_pt2, '250000'::double precision)) AND _st_dwithin('0101000020E610000088855AD3BCBB474052499D8026A22040'::geography, geo_pt2, '250000'::double precision, true))我在现代服务器上使用PostgreSQL 10,它有很高的IOPS (NVMe),查询需要35秒。
在推荐了@Evan更好的表现之后:
EXPLAIN ANALYZE SELECT hash_id, town
, ST_Distance(ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2) AS dist
FROM member_profile
WHERE ST_DWithin(ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2, 250000)
ORDER BY ST_MakePoint(47.4667, 8.3167)::geography <-> geo_pt2
OFFSET 10000
FETCH NEXT 100 ROWS ONLY;
Limit (cost=9.31..18.21 rows=1 width=61) (actual time=392.608..394.138 rows=100 loops=1)
-> Index Scan using geo_pt2_gix on member_profile (cost=0.42..9.31 rows=1 width=61) (actual time=26.624..392.776 rows=10100 loops=1)
Index Cond: (geo_pt2 && '0101000020E610000088855AD3BCBB474052499D8026A22040'::geography)
Order By: (geo_pt2 <-> '0101000020E610000088855AD3BCBB474052499D8026A22040'::geography)
Filter: (('0101000020E610000088855AD3BCBB474052499D8026A22040'::geography && _st_expand(geo_pt2, '250000'::double precision)) AND _st_dwithin('0101000020E610000088855AD3BCBB474052499D8026A22040'::geography, geo_pt2, '250000'::double precision, true))
Planning time: 89.020 ms
Execution time: 395.039 ms如果用户分页以结束它的速度变慢了:
EXPLAIN ANALYZE SELECT hash_id, town
, ST_Distance(ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2) AS dist
FROM member_profile
WHERE ST_DWithin(ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2, 250000)
ORDER BY ST_MakePoint(47.4667, 8.3167)::geography <-> geo_pt2
OFFSET 1000000
FETCH NEXT 100 ROWS ONLY;
Limit (cost=9.31..18.21 rows=1 width=61) (actual time=28872.156..28873.239 rows=100 loops=1)
-> Index Scan using geo_pt2_gix on member_profile (cost=0.42..9.31 rows=1 width=61) (actual time=32.441..28764.569 rows=1000100 loops=1)
Index Cond: (geo_pt2 && '0101000020E610000088855AD3BCBB474052499D8026A22040'::geography)
Order By: (geo_pt2 <-> '0101000020E610000088855AD3BCBB474052499D8026A22040'::geography)
Filter: (('0101000020E610000088855AD3BCBB474052499D8026A22040'::geography && _st_expand(geo_pt2, '250000'::double precision)) AND _st_dwithin('0101000020E610000088855AD3BCBB474052499D8026A22040'::geography, geo_pt2, '250000'::double precision, true))
Planning time: 50.979 ms
Execution time: 28875.403 ms发布于 2018-08-13 14:04:01
首先,使用EXPLAIN ANALYZE (不仅仅是EXPLAIN)并在表上显示\d的结果。(psql)。作为第一点,
ST_GeographyFromText('POINT(47.4667 8.3167)')应该写成ST_MakePoint(47.4667, 8.3167)::geography
你的问题是这个模式,
SELECT ST_Distance( ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2) AS dist
...
ORDER BY dist
LIMIT 100 OFFSET 1000;每次这样做,至少必须计算到1100行的距离。尽管如此,这不应该太慢。这很慢,因为为了做到这一点,您必须在所有行上计算ST_Distance。我们可以停止使用KNN,在那里使用<->算子。MySQL不支持KNN。
SELECT hash_id, town
, ST_Distance(ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2) AS dist
FROM member_profile
WHERE ST_DWithin(ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2, 250000)
ORDER BY ST_MakePoint(47.4667, 8.3167)::geography <-> geo_pt2
OFFSET 1000
FETCH NEXT 100 ROWS ONLY;作为一种风格批评,我个人更喜欢偏移/取回 (标准化方法限制/偏移)。
我不确定这能起作用。但是,这可能是值得一试的(随时更新)。
SELECT hash_id, town
, ST_Distance(ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2) AS dist
, ST_MakePoint(47.4667, 8.3167)::geography <-> geo_pt2 AS myknn
FROM member_profile
WHERE ST_DWithin(ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2, 250000)
AND ST_MakePoint(47.4667, 8.3167)::geography <-> geo_pt2 > OLD_VALUE
ORDER BY ST_MakePoint(47.4667, 8.3167)::geography <-> geo_pt2
FETCH NEXT 100 ROWS ONLY;所以当您第一次运行这个时,您保存myknn的最后一个值,然后第二次运行这个值,您可以在这个子句中以OLD_VALUE的形式重放这个值,
AND ST_MakePoint(47.4667, 8.3167)::geography <-> geo_pt2 > OLD_VALUE因此,每次运行它时,您都要保存要继续运行的新点,并使用FETCH NEXT x ROWS ONLY。
myknn和dist对您来说可能是一样的,如果是的话,您只需删除其中的一个。
https://dba.stackexchange.com/questions/214782
复制相似问题