我有以下事件表。
key time_stamp geohash
k1 1 thred0y
k2 5 thred0v
k4 7 thre6rd
k3 9 thre6rg
k1 10 thred3t
k1 12 thred3u
k2 14 thred3s在这里,如果密钥在10分钟的时间间隔内落在500mts范围内,我想将它们集群到组中。
我试着交叉加入他们
select a.key, b.key, a.geohash, b.geohash, a.time_stamp, b.time_stamp,
round(ST_Distance(ST_PointFromGeoHash(a.geohash, 4326), ST_PointFromGeoHash(b.geohash, 4326), true)) distance,
abs(round(extract(EPOCH from a.time_stamp - b.time_stamp)/60))
from t a, t b
where a.key <> b.key
and a.time_stamp between b.time_stamp - interval '10 min' and b.time_stamp + interval '10 min'
and ST_Distance(ST_PointFromGeoHash(a.geohash, 4326), ST_PointFromGeoHash(b.v, 4326), true) <= 500
and least(a.key, b.key) = a.key
order by a.time_stamp desc但是,查询可以很好地处理小数据,另外,只有当有两个不同的键,但不超过2个时,查询才能正常工作。
任何关于如何进一步进行的输入都将是有帮助的。
我添加了一些用于测试的样本数据,https://pastebin.com/iVD1WU4Y。
发布于 2020-02-09 00:40:53
我找到了解决方案,在60分钟内集群关键字,以及1.2公里的距离。
with x as (
select key, time_stamp, geo, prev_ts, geo_hash6,
count(case when prev_ts is null or prev_ts > 60 then 1 else null end) over(order by time_stamp) cluster_id
from (
select key, time_stamp, geo,
EXTRACT(EPOCH FROM time_stamp - lag(time_stamp) over(order by time_stamp)) prev_ts,
substring(geo, 1, 6) geo_hash6
from t
) a
order by cluster_id, geo_hash6, geo, time_stamp)
select x.cluster_id, x.key, x.geo_hash6, min(time_stamp) first_time, max(time_stamp) last_time
from x, (select cluster_id, geo_hash6, count(distinct key) num_uniques from x group by cluster_id, geo_hash6) y
where x.cluster_id = y.cluster_id and x.geo_hash6 = y.geo_hash6 and y.num_uniques > 2
group by x.cluster_id, x.geo_hash6, x.key
order by x.cluster_id, x.geo_hash6;欢迎任何改进解决方案的建议。
https://stackoverflow.com/questions/60111686
复制相似问题