我把计数器数据存储在Hive表中。计数器在时间上递增,有时被重置为零。
我想计算连续行之间的差异,但如果计数器重置,则差异为负值。这里有一个示例数据和预期输出:
data: 1, 3, 6, 7, 1, 4
difference: 2, 3, 1, -6, 3, NA
expected: 2, 3, 1, 1, 3, NA 通常这样的操作是通过计算滞后并从数据中减去它来完成的。在出现负差异的情况下,我们应该只使用滞后值,下面是函数的一个例子,在R/dplyr中这样做:
diff_counter <-function(x){
# count difference between measurements
lag <- lag(x)
dx <- x - lag
reset_idx <- dx < 0 & !is.na(dx)
dx[reset_idx] = lag[reset_idx]
return(dx)
}我能在蜂巢里做些类似的事吗?
问候Paweł
发布于 2018-03-01 04:19:19
假设t是您的datetime列,并且计数器按该顺序递增,那么您可以使用一个CASE块,其LEAD函数如下所示。
SELECT x
,CASE
WHEN (
LEAD(x) OVER (
ORDER BY t
) - x
) > 0
THEN LEAD(x) OVER (
ORDER BY t
) - x
ELSE LEAD(x) OVER (
ORDER BY t
)
END AS diff
FROM yourtable;
| X | DIFF |
|---|--------|
| 1 | 2 |
| 3 | 3 |
| 6 | 1 |
| 7 | 1 |
| 1 | 3 |
| 4 | (null) |https://stackoverflow.com/questions/49036668
复制相似问题