我是个蜂巢新手,想从扁平的桌子上搬出一张桌子。我的扁平桌子如下
create table data(auth string, file string, documents string)
row format delimited
fields terminated by '\t' ;我的存储桶表如下
create table test(auth string, documents string)
partitioned by (file string)
clustered by(auth) into 2 buckets ;我必须创作A和B以及它们的10-10个文档,当我尝试在存储桶表中插入数据时,我会成功地执行,但问题是希望每个作者的所有10个文件都在同一个分区中,但我得到了一个包含所有10个文件内容的文件。
发布于 2015-07-03 12:44:39
我假设下面的表结构:扁平表:
CREATE TABLE flattable (id INT, author STRING, book STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',';bucketedtable_:
CREATE TABLE bucketedtable (id INT, book STRING)
partitioned by (author STRING)
CLUSTERED BY (book) INTO 10 BUCKETS;在配置单元中设置属性:
set hive.enforce.bucketing = true;
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;从flammable插入到bucketedtable _中
INSERT INTO TABLE bucketedtable
PARTITION (author)
SELECT id, book, author
FROM flattable;:你只需要交换分区依据和集群依据字段。
https://stackoverflow.com/questions/29509670
复制相似问题