首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >PostgreSQL在查询过程中内存不足

PostgreSQL在查询过程中内存不足
EN

Database Administration用户
提问于 2020-01-28 01:45:18
回答 1查看 1.6K关注 0票数 1

我正在db.t2.xlarge实例(4 CPU 16 Gb )中运行AWS中的一个D0 11数据库,其中包含4 Tb的存储。我的查询是基于一个相当大的表(48 Gb -- 243.711.955行),但是没有什么是PostgreSQL无法处理的。

我正在运行一个简单的查询(我添加了查询计划,以防有任何洞察力),但是在大约10个小时之后,存储空间急剧下降到零,查询失败了:

代码语言:javascript
复制
ERROR:  could not extend file "base/16401/321099.3525": No space left on device
HINT:  Check free disk space.

我试图用额外的存储来解决这个问题(我将1TB添加到数据库中,并在所有表上运行一个VACCUM --我知道在RDS实例中设置了autovaccum ),但是问题仍然存在。我担心增加空间可能不是可能的解决方案,因为数据库所需的时间不到1TB:

代码语言:javascript
复制
         name         |    owner    |   size
----------------------+-------------+-----------
 rdsadmin             | rdsadmin    | No Access
         |    | 317 GB

我不完全理解PostgreSQL创建临时文件的方式,这些散列还是临时ORDER BY索引(?),但我的db中有大量的这些:

代码语言:javascript
复制
       datname        | Temporary files | Size of temporary files
----------------------+-----------------+-------------------------
 rdsadmin             |               0 |                       0
 template0            |               0 |                       0
 postgres             |               0 |                       0
         |            1079 |            991097557473
 template1            |               0 |                       0

我应该继续添加存储空间直到查询运行(RDS对于这些情况有一个“紧急存储”)吗?还是在运行查询之前缺少了某种配置?

编辑:添加查询。

查询:

代码语言:javascript
复制
    with times as (
        select name_source,
               id_source,
               run,
               init_dt,
               hour_along,
               ntile(3) over (partition by run, date_part('year', init_dt), id_source order by hour_along) as t,
               height,
               traj_dt,
               geom
        from hysplit_process.clean_trajectories
    ),
         dates_for_t as (
             select distinct on (
                id_source,
                name_source,
                run,
                date_part('year', init_dt),
                t) name_source,
                   id_source,
                   run,
                   t,
                   traj_dt
             from times
             order by id_source, name_source, run, date_part('year', init_dt), run, t, traj_dt desc
         ),
         agg_trajs_by_t as (
             select name_source,
                    id_source,
                    run,
                    t,
                    init_dt,
                    avg(st_x(geom)) as avg_lon,
                    avg(st_y(geom)) as avg_lat,
                    avg(height)     as avg_height
             from times
             group by name_source, id_source, run, init_dt, t
         )
    select t.name_source,
           t.id_source,
           t.run,
           t.t,
           t.init_dt,
           d.traj_dt,
           st_setsrid(st_makepoint(t.avg_lon, t.avg_lat), 4326) as geom,
           t.avg_height
    from agg_trajs_by_t as t
             left join dates_for_t as d
                       using (run, t);

查询计划:

代码语言:javascript
复制
                                                                              QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Right Join  (cost=274164346.44..274773926.31 rows=24371195 width=104)
   Merge Cond: ((d.run = t.run) AND (d.t = t.t))
   CTE times
     ->  WindowAgg  (cost=90827593.60..97529672.28 rows=243711952 width=100)
           ->  Sort  (cost=90827593.60..91436873.48 rows=243711952 width=96)
                 Sort Key: clean_trajectories.run, (date_part('year'::text, clean_trajectories.init_dt)), clean_trajectories.id_source, clean_trajectories.hour_along
                 ->  Seq Scan on clean_trajectories  (cost=0.00..6897785.40 rows=243711952 width=96)
   CTE dates_for_t
     ->  Unique  (cost=72144047.24..73971886.88 rows=40000 width=56)
           ->  Sort  (cost=72144047.24..72753327.12 rows=243711952 width=56)
                 Sort Key: times.run, times.t, times.traj_dt DESC
                 ->  CTE Scan on times  (cost=0.00..4874239.04 rows=243711952 width=56)
   CTE agg_trajs_by_t
     ->  GroupAggregate  (cost=88804047.24..95932621.83 rows=24371195 width=80)
           Group Key: times_1.name_source, times_1.id_source, times_1.run, times_1.init_dt, times_1.t
           ->  Sort  (cost=88804047.24..89413327.12 rows=243711952 width=96)
                 Sort Key: times_1.name_source, times_1.id_source, times_1.run, times_1.init_dt, times_1.t
                 ->  CTE Scan on times times_1  (cost=0.00..4874239.04 rows=243711952 width=96)
   ->  Sort  (cost=3857.54..3957.54 rows=40000 width=16)
         Sort Key: d.run, d.t
         ->  CTE Scan on dates_for_t d  (cost=0.00..800.00 rows=40000 width=16)
   ->  Materialize  (cost=6726307.90..6848163.88 rows=24371195 width=80)
         ->  Sort  (cost=6726307.90..6787235.89 rows=24371195 width=80)
               Sort Key: t.run, t.t
               ->  CTE Scan on agg_trajs_by_t t  (cost=0.00..487423.90 rows=24371195 width=80)
EN

回答 1

Database Administration用户

发布于 2020-02-04 16:19:33

谢谢大家的建议。

在上述评论之后,我将CTE分解成几个较小的步骤。这不仅允许我使用更多的索引,而且还可以避免我的RDS实例崩溃。我希望这对未来的人们有所帮助,以防他们被PostgreSQL 11塞满。

票数 0
EN
页面原文内容由Database Administration提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://dba.stackexchange.com/questions/258311

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档