我有一个表,记录了各种服务器的记录大小,以及服务器被扫描的日期。我需要获得每个服务器在每个月的最新条目。我怎样才能做到这一点在impala sql..any帮助将不胜感激。
Data Server Size
11/4/2017 ABC 200
11/18/2017 ABC 700
11/25/2017 ABC 1009
12/4/2017 ABC 200
12/18/2017 ABC 700
12/20/2017 ABC 1100
1/4/2018 ABC 200
1/18/2018 ABC 700
1/20/2018 ABC 1009
11/4/2017 CAD 200
11/18/2017 CAD 700
11/25/2017 CAD 1009
12/4/2017 CAD 200
12/18/2017 CAD 700
12/20/2017 CAD 1100预期结果
Data Server Size
11/25/2017 ABC 1009
12/20/2017 ABC 1100
1/20/2018 ABC 1009
11/25/2017 CAD 1009
12/20/2017 CAD 1100发布于 2018-01-13 10:25:57
Impala支持窗口函数,因此您可以:
select t.*
from (select t.*,
row_number() over (partition by server, trunc(data, 'MONTH')
order by data desc
) as seqnum
from t
) t
where seqnum = 1;编辑:
上面给出了每台服务器的最新值。对于每月一行,请从partition by中删除server
select t.*
from (select t.*,
row_number() over (partition by trunc(data, 'MONTH')
order by data desc
) as seqnum
from t
) t
where seqnum = 1;发布于 2018-01-13 10:36:52
SELECT t.*
FROM t
INNER JOIN
(SELECT MONTH(data) AS month, MAX(DAY(data)) AS day, server
FROM t
GROUP BY MONTH(data), server) sub
ON (MONTH(t.data) = sub.month AND DAY(t.data) = sub.day AND t.server = sub.server)选择子查询中每个服务器每月的最大天数。然后将子查询的结果连接到主表。这将消除每个服务器每月不是最新的行。
在这里测试:http://rextester.com/QTIM39711
https://stackoverflow.com/questions/48236385
复制相似问题