我有以下数据:
country objectid objectuse
record_date
2022-07-20 chile 0 4
2022-07-01 chile 1 4
2022-07-02 chile 1 4
2022-07-03 chile 1 4
2022-07-04 chile 1 4
... ... ... ...
2022-07-26 peru 3088 4
2022-07-27 peru 3088 4
2022-07-28 peru 3088 4
2022-07-30 peru 3088 4
2022-07-31 peru 3088 4数据描述了一个国家内一个月(2022年7月)内对象的每日使用情况,而不是每天都使用所有对象。我感兴趣的一件事是这个月的每月最大值之和:
WITH month_max AS (
SELECT
country,
objectid,
MAX(objectuse) AS maxuse
FROM mytable
GROUP BY
country,
objectid
)
SELECT
country,
SUM(maxuse)
FROM month_max
GROUP BY country;其结果是:
country sum
-------------
chile 1224
peru 17008 但我真正想要的是从月初到每个日期的最大值的滚动和。所以我得到了一个看上去像:
country sum
record_date
2022-07-01 chile 1
2022-07-01 peru 1
2022-07-02 chile 2
2022-07-02 peru 3
... ... ...
2022-07-31 chile 1224
2022-07-31 peru 17008我尝试使用像这样的窗口函数,但没有结果:
SELECT
*,
SUM(objectuse) OVER (
PARTITION BY country
ORDER BY record_date ROWS 30 PRECEDING
) as cumesum
FROM mytable
order BY cumesum DESC;有什么方法可以在SQL中实现预期的结果吗?
提前谢谢。
编辑:关于它的价值,我问了same question but on Pandas,我得到了一个答案;也许它有助于找出如何在SQL中完成它。
发布于 2022-08-25 17:42:36
最终起作用的可能不是解决这个问题的最有效的方法。实际上,我从一个月的每一天开始,一直往前看,一直往前看。在这些桶中的每一个桶中,对于该桶中的每个objectid,我都得到了最大的objectid。取最大值后,我把所有的最大值相加到那个回溯期。我每天都这么做。
下面是执行此操作的查询:
WITH daily_lookback AS (
SELECT
A.record_date,
A.country,
B.objectid,
MAX(B.objectuse) AS maxuse
FROM mytable AS A
LEFT JOIN mytable AS B
ON A.record_date >= B.record_date
AND A.country = B.country
AND DATE_PART('month', A.record_date) = DATE_PART('month', B.record_date)
AND DATE_PART('year', A.record_date) = DATE_PART('year', B.record_date)
GROUP BY
A.record_date,
A.country,
B.objectid
)
SELECT
record_date,
country,
SUM(maxuse) AS usetotal
FROM daily_lookback
GROUP BY
record_date,
country
ORDER BY
record_date;这正是我想要的:回顾期的objectid最大值的累积和,如下所示:
country sum
record_date
2022-07-01 chile 1
2022-07-01 peru 1
2022-07-02 chile 2
2022-07-02 peru 3
... ... ...
2022-07-31 chile 1224
2022-07-31 peru 17008发布于 2022-08-25 01:07:01
您需要更改您的内部查询以使用加窗口的最大值:
WITH month_max AS (
SELECT record_date, country, objectid,
MAX(objectuse) over (PARTITION BY country, objectid ORDER BY record_date) AS mx
FROM mytable
)
SELECT record_date, country, SUM(mx) as "sum"
FROM month_max
GROUP BY record_date, country;这确实假定每个对象每个日期都有一行。
这里是您的查询的重写版本。通过索引,它可能运行得更快:
select record_date, country, min(usetotal) as usetotal
from mytable d inner join lateral (
select distinct sum(max(objectuse)) over () as usetotal from mytable a
where a.record_date between date_trunc('month', d.record_date) and d.record_date
and a.country = d.country
group by objectid
) T on 1 = 1
group by record_date, country
order by record_date, country;https://dbfiddle.uk/?rdbms=postgres_14&fiddle=63760e30aecf4c885ec4967045b6cd03
发布于 2022-08-24 03:16:06
我们可以使用SUM()作为窗口函数,并按年份和月份进行分区。
SELECT record_date, country, objectid,
SUM(objectuse) OVER (PARTITION BY TO_CHAR(record_date, 'YYYY-MM'), country
ORDER BY record_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS sum
FROM mytable
ORDER BY record_date;https://stackoverflow.com/questions/73467012
复制相似问题