
如下SQL的运行结果是错误的:
SELECT
n.nspname as schema_name,
c.gp_segment_id,
pg_size_pretty(SUM(pg_total_relation_size(c.oid))) as segment_size
FROM
gp_dist_random ('pg_class') c
JOIN pg_namespace n ON c.relnamespace = n.oid
WHERE
c.relkind ='r'
AND n.nspname NOTIN ('pg_catalog', 'information_schema', 'gp_toolkit')
AND n.nspname !~'^pg_%'
GROUPBY
n.nspname,
c.gp_segment_id
ORDERBY n.nspname,gp_segment_id;
其实,应该是segment_id为0的占用了900MB。 下面的才是正确的:

QQ_1756780039480
直连segment 0 ,获取大小:
[root@mdw /]# PGOPTIONS='-c gp_session_role=utility' psql -h sdw1 -p 6000-U gpadmin -d db2
psql (13.12, server 12.12)
Type "help" for help.
db2=# \dt+ ods.*
List of relations
Schema | Name | Type | Owner | Persistence | Size | Description
--------+------------------------------+-------+---------+-------------+--------+-------------
ods | dds_imp_settle_pbt_rept_bt_d |table| gpadmin | permanent |747 MB |
ods | dom_dept |table| gpadmin | permanent |192 kB | 科信息
ods | dom_dept_rep |table| gpadmin | permanent |32 kB |
(3rows)
db2=# \di+ ods.*
List of relations
Schema | Name | Type | Owner | Table | Persistence | Size | Description
--------+--------------------------------------+-------+---------+------------------------------+-------------+--------+-------------
ods | idx_dds_imp_settle_pbt_rept_bt_d_all | index | gpadmin | dds_imp_settle_pbt_rept_bt_d | permanent |146 MB |
(1row)
db2=# gp_dist_random() 是 Greenplum Database 里的一个特殊系统函数,主要用于在 所有 segment 节点 上分布式地执行查询,返回每个 segment 上的结果集。这个函数有如下的作用:
gp_dist_random('表名') 的作用就是从 所有 segment 的本地表 中取数据,而不是像普通 SQL 那样通过协调器合并后的逻辑视图。gp_segment_id 的结果,可以用来分析数据在各个 segment 上的分布情况。这个SQL出错的问题在于:不能先做join操作,也不能先做group by、sum等聚合操作:
SELECT
n.nspname as schema_name,
c.gp_segment_id,
pg_size_pretty(SUM(pg_total_relation_size(c.oid))) as segment_size
FROM
gp_dist_random ('pg_class') c
JOIN pg_namespace n ON c.relnamespace = n.oid
WHERE
c.relkind ='r'
AND n.nspname NOTIN ('pg_catalog', 'information_schema', 'gp_toolkit')
AND n.nspname !~'^pg_%'
GROUPBY
n.nspname,
c.gp_segment_id
ORDERBY n.nspname,gp_segment_id;所以,修改为如下的结果是正确的:
SELECT schema_name,gp_segment_id,pg_size_pretty(sum(sz)) seg_size from (
SELECT
(select n.nspname from pg_namespace n where c.relnamespace = n.oid) as schema_name,
c.relnamespace,
c.gp_segment_id,
pg_size_pretty((pg_total_relation_size (c.oid))) as segment_size,
(pg_total_relation_size (c.oid)) sz
FROM
gp_dist_random ('pg_class') c
WHERE
c.relkind ='r' ) v
WHERE v.schema_name NOTIN ('information_schema','gp_toolkit','pg_catalog','test','tmp')
AND v.schema_name NOTLIKE'pg_temp%'
GROUPBY schema_name,gp_segment_id
ORDERBY1,2;
1、当有gp_dist_random函数时,不能先做join操作,也不能先做group by、sum等聚合操作,否则会导致数据不准确,可能是优化器只在某个执行器进程(QE)上计算,然后再把结果重分发,导致计算错误。
https://groups.google.com/a/greenplum.org/g/gpdb-dev/c/DkTx4O-kuH0
https://www.enterprisedb.com/blog/how-do-postgresql-securitybarrier-views-work