首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >在PostGIS中为每个特性创建7天滚动平均值

在PostGIS中为每个特性创建7天滚动平均值
EN

Database Administration用户
提问于 2020-05-21 12:09:36
回答 1查看 339关注 0票数 1

我有一个timeseries数据集,如下所示:

代码语言:javascript
复制
uid | geom     | date       | count |
-------------------------------------
1   | FeatureA | 2016-02-01 | 1     | 
2   | FeatureA | 2016-02-02 | 2     | 
3   | FeatureA | 2016-02-03 | 3     | 
4   | FeatureA | 2016-02-04 | 4     | 
5   | FeatureA | 2016-02-05 | 5     | 
6   | FeatureA | 2016-02-06 | 9     | 
7   | FeatureA | 2016-02-07 | 11    | 
8   | FeatureA | 2016-02-08 | 15    | 
9   | FeatureA | 2016-02-09 | 17    | 
10  | FeatureA | 2016-02-10 | 20    | 
11  | FeatureB | 2016-02-01 | 2     | 
12  | FeatureB | 2016-02-02 | 2     | 
13  | FeatureB | 2016-02-03 | 8     | 
14  | FeatureB | 2016-02-04 | 4     | 
15  | FeatureB | 2016-02-05 | 5     | 
16  | FeatureB | 2016-02-06 | 15    | 
17  | FeatureB | 2016-02-07 | 11    | 
18  | FeatureB | 2016-02-08 | 15    | 
19  | FeatureB | 2016-02-09 | 19    | 
20  | FeatureB | 2016-02-10 | 25    | 

我想计算数据集中的每个特征的7天滚动平均值(~2000特征)。为了计算Postgres的7天滚动平均值,可以使用描述的这里窗口。

以下代码几乎有效:

代码语言:javascript
复制
SELECT geom,date,count,  
       AVG(count)
            OVER(PARTITION BY geom ORDER BY geom, date ROWS BETWEEN CURRENT ROW AND 7 Following) AS rolling_avg_count
FROM features;

这提供了以下输出:

代码语言:javascript
复制
 geom    | date       | count | rolling_avg_count
--------------------------------------------------
FeatureA | 2016-02-01 | 1     | 6.25
FeatureA | 2016-02-02 | 2     | 8.25
FeatureA | 2016-02-03 | 3     | 10.5
FeatureA | 2016-02-04 | 4     | 11.57
FeatureA | 2016-02-05 | 5     | 12.83
FeatureA | 2016-02-06 | 9     | 14.4
FeatureA | 2016-02-07 | 11    | 15.75
FeatureA | 2016-02-08 | 15    | 17.33
FeatureA | 2016-02-09 | 17    | 18.5
FeatureA | 2016-02-10 | 20    | 20
FeatureB | 2016-02-01 | 2     | 7.75
FeatureB | 2016-02-02 | 2     | 9.875
FeatureB | 2016-02-03 | 8     | 12.75
FeatureB | 2016-02-04 | 4     | 13.43
FeatureB | 2016-02-05 | 5     | 15
FeatureB | 2016-02-06 | 15    | 17
FeatureB | 2016-02-07 | 11    | 17.5
FeatureB | 2016-02-08 | 15    | 19.67
FeatureB | 2016-02-10 | 25    | 25

但是,输出将继续计算平均值,直到分区结束为止。例如,uid 10的滚动平均值为20 (仅使用一条记录计算)。如果后面的行少于7行,我希望计算停止。

理想情况下,输出应该如下所示:

代码语言:javascript
复制
 geom    | date       | count | rolling_avg_count
--------------------------------------------------
FeatureA | 2016-02-01 | 1     | 6.25
FeatureA | 2016-02-02 | 2     | 8.25
FeatureA | 2016-02-03 | 3     | 10.5
FeatureA | 2016-02-04 | 4     | 11.57
FeatureA | 2016-02-05 | 5     | 
FeatureA | 2016-02-06 | 9     | 
FeatureA | 2016-02-07 | 11    | 
FeatureA | 2016-02-08 | 15    | 
FeatureA | 2016-02-09 | 17    | 
FeatureA | 2016-02-10 | 20    | 
FeatureB | 2016-02-01 | 2     | 7.75
FeatureB | 2016-02-02 | 2     | 9.875
FeatureB | 2016-02-03 | 8     | 12.75
FeatureB | 2016-02-04 | 4     | 13.43
FeatureB | 2016-02-05 | 5     | 
FeatureB | 2016-02-06 | 15    | 
FeatureB | 2016-02-07 | 11    | 
FeatureB | 2016-02-08 | 15    | 
FeatureB | 2016-02-10 | 25    | 
EN

回答 1

Database Administration用户

回答已采纳

发布于 2020-05-22 00:02:04

这将产生所需的结果--在调整了-1的错误后,2:

代码语言:javascript
复制
SELECT geom, date, count
     , CASE WHEN rn < 7 THEN NULL  -- ②
            ELSE round(rolling_avg_count, 2) END AS rolling_avg_count
FROM  (
   SELECT geom, date, count
        , AVG(count) OVER (PARTITION BY geom
                           ORDER BY date -- ①
                           ROWS BETWEEN CURRENT ROW AND 6 FOLLOWING -- ②
                          ) AS rolling_avg_count
        , row_number() OVER (PARTITION BY geom ORDER BY date DESC) AS rn
   FROM   features
   ) sub;

db<>fiddle 这里

1使用PARTITION BY geom,您不需要在ORDER BY中使用geom

你犯了一个错误(S):

我想计算一下7天的滚动平均值。

但是,您显示了8天滚动平均值(1 + 7)的计算。

如果后面的行少于7行,我希望计算停止。

但是您只显示了以下6行的结果(显示值11.57和13.43)。

我使它成为一个实际的7天平均,停止时,有不到6行以下。

性能优化

由于第二个窗口函数中的降序排序顺序,上述内容为查询计划添加了一个额外的排序。

这个替代查询避免了使用附加的count()

代码语言:javascript
复制
SELECT geom, date, count
     , CASE WHEN rn < 6 THEN NULL -- ③
            ELSE round(rolling_avg_count, 2) END AS rolling_avg_count
FROM  (
   SELECT geom, date, count
        , avg(count)   OVER (PARTITION BY geom
                             ORDER BY date
                             ROWS BETWEEN CURRENT ROW AND 6 FOLLOWING
                            ) AS rolling_avg_count
        , count(*)     OVER (PARTITION BY geom)
        - row_number() OVER (PARTITION BY geom ORDER BY date) AS rn
   FROM   features
   ) sub;

db<>fiddle 这里

再详细一点,添加一个额外的窗口函数。但我希望它能表现得更好。

3调整为0基行号。(比添加+ 1便宜。)

票数 1
EN
页面原文内容由Database Administration提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://dba.stackexchange.com/questions/267610

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档