文章/答案/技术大牛

发布

问格林梅分布
EN

Stack Overflow用户

提问于 2022-04-25 19:04:49

回答 1查看 213关注 0票数 0

有一个具有随机分布的表。

CREATE TABLE schema.table (
    col1 int4 NULL,
    col2 int4 NULL,
    col3 int4 NULL
)
WITH (
    appendonly=true,
    compresstype=zstd,
    orientation=column
)
DISTRIBUTED RANDOMLY;

我们需要优化(以最小的倾斜度)在一个字段上分配行。为此，我们可以创建测试表。

CREATE TABLE schema.test_table (
    col_1 int4 NULL,
    col_2 int4 NULL,
    col_3 int4 NULL
)
WITH (
    appendonly=true,
    compresstype=zstd,
    orientation=column
)
DISTRIBUTED BY (col_i); 
INSERT INTO schema.test_table SELECT * FROM schema.table;

然后检查它们是否倾斜，例如

select * from gp_toolkit.gp_skew_coefficient('schema.test_table'::regclass);

问题是，我们希望检查表是否倾斜，而不创建测试表。这能做到吗?如果是，怎么做？

sql

distribution

greenplum

skew

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-04-26 15:43:06

如果您真的不想创建一个新的表，但是我不知道如何在不实际(重新)分发数据的情况下对发行版进行建模，这样的方法会有效。

foo=# create table foo(a int, b int, c int);
NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'a' as the Greenplum Database data distribution key for this table.
HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
CREATE TABLE

foo=# insert into foo values (generate_series(1,100), generate_series(101,200), generate_series(2001, 2100));
INSERT 0 100

foo=# select * from gp_toolkit.gp_skew_coefficient('public.foo'::regclass);
 skcoid |         skccoeff
--------+--------------------------
  76788 | 18.460769214742921763000
(1 row)

foo=# select gp_segment_id, count(*) from foo group by 1 order by 1;
 gp_segment_id | count
---------------+-------
             0 |    17
             1 |    18
             2 |    23
             3 |    17
             4 |    15
             5 |    10
(6 rows)


foo=# ALTER TABLE foo SET
foo-# WITH (REORGANIZE=true)
foo-# DISTRIBUTED BY (a);
ALTER TABLE
foo=# select * from gp_toolkit.gp_skew_coefficient('public.foo'::regclass);
 skcoid |         skccoeff
--------+--------------------------
  76788 | 18.460769214742921763000
(1 row)

foo=# select gp_segment_id, count(*) from foo group by 1 order by 1;
 gp_segment_id | count
---------------+-------
             0 |    21
             1 |    18
             2 |    12
             3 |    15
             4 |    18
             5 |    16
(6 rows)

foo=#

foo=# ALTER TABLE foo SET
WITH (REORGANIZE=true)
DISTRIBUTED BY (b);
ALTER TABLE
foo=# select * from gp_toolkit.gp_skew_coefficient('public.foo'::regclass);
 skcoid |         skccoeff
--------+--------------------------
  76788 | 27.011108825814611346000
(1 row)

foo=# select gp_segment_id, count(*) from foo group by 1 order by 1;
 gp_segment_id | count
---------------+-------
             0 |    12
             1 |    14
             2 |    20
             3 |    24
             4 |    16
             5 |    14
(6 rows)

foo=#


foo=# ALTER TABLE foo SET
WITH (REORGANIZE=true)
DISTRIBUTED BY (c);
ALTER TABLE
foo=# select * from gp_toolkit.gp_skew_coefficient('public.foo'::regclass);
 skcoid |         skccoeff
--------+--------------------------
  76788 | 30.983866769659334938000
(1 row)

foo=#

foo=# select gp_segment_id, count(*) from foo group by 1 order by 1;
 gp_segment_id | count
---------------+-------
             0 |    19
             1 |    10
             2 |    20
             3 |    23
             4 |    11
             5 |    17
(6 rows)

foo=#

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/72004519

复制

相似问题

问格林梅分布
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问格林梅分布EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问格林梅分布
EN