我们可以用两种不同的方式使用窗口。
SELECT key, MAX(value)
FROM table
GROUP BY key, TUMBLE(ts, INTERVAL '5' MINUTE)和
SELECT key, MAX(value) OVER w
FROM table
WINDOW w AS (PARTITION BY key ORDER BY ts ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)我想知道这两种机制是否能够表达相同的东西,或者这两种机制的主要区别和可能的用例是什么?
发布于 2020-05-28 16:34:04
这两个查询都计算与常规SQL语义相对应的不同结果。因此,区别不是特定于Flink的,而是由SQL标准定义的。
第一个查询
SELECT key, MAX(value)
FROM table
GROUP BY key, TUMBLE(ts, INTERVAL '5' MINUTE)组记录每key和5分钟桶。该查询每5分钟生成一行key值,最大value值为。对于每个组,多个行聚合为一个行。
第二个查询
SELECT key, MAX(value) OVER w
FROM table
WINDOW w AS (PARTITION BY key ORDER BY ts ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)为输入的每一行(table)生成一行。结果行具有到目前为止所观察到的key值的最大key值(行由ts排序)。注意,多个行不是聚合到一个行;每个输入行导致一个输出行。此外,最大聚集的范围可以超过5分钟。实际上,在这个例子中,它是整个key分区。
发布于 2022-06-19 05:14:19
法比安的回答非常正确。如果仍然有任何问题,您可以参考这个例子。这个例子来自于时间序列数据的滚动聚类
CREATE TEMPORARY TABLE temperature_measurements (
measurement_time TIMESTAMP(3),
city STRING,
temperature FLOAT,
WATERMARK FOR measurement_time AS measurement_time - INTERVAL '15' SECONDS
)
WITH (
'connector' = 'faker',
'fields.measurement_time.expression' = '#{date.past ''15'',''SECONDS''}',
'fields.temperature.expression' = '#{number.numberBetween ''0'',''50''}',
'fields.city.expression' = '#{regexify ''(Chicago|Munich|Berlin|Portland|Hangzhou|Seatle|Beijing|New York){1}''}'
);下面是窗口的使用情况:
SELECT
measurement_time,
city,
temperature,
AVG(CAST(temperature AS FLOAT)) OVER last_minute AS avg_temperature_minute,
MAX(temperature) OVER last_minute AS min_temperature_minute,
MIN(temperature) OVER last_minute AS max_temperature_minute,
STDDEV(CAST(temperature AS FLOAT)) OVER last_minute AS stdev_temperature_minute
FROM temperature_measurements
WINDOW last_minute AS (
PARTITION BY city
ORDER BY measurement_time
RANGE BETWEEN INTERVAL '1' MINUTE PRECEDING AND CURRENT ROW
);结果是:

下面是按用法划分的组:
SELECT
city,
AVG(CAST(temperature AS FLOAT)) AS avg_temperature_minute,
MAX(temperature) AS min_temperature_minute,
MIN(temperature) AS max_temperature_minute,
STDDEV(CAST(temperature AS FLOAT)) AS stdev_temperature_minute
FROM temperature_measurements
group by
city,
TUMBLE(measurement_time, INTERVAL '1' MINUTES);其结果是:

所以,最后的区别是:
https://stackoverflow.com/questions/62049291
复制相似问题