文章/答案/技术大牛

发布

社区首页 >问答首页 >如何调优Hive插入覆盖分区？

问如何调优Hive插入覆盖分区？
EN

Stack Overflow用户

提问于 2016-04-04 11:15:08

回答 1查看 2.1K关注 0票数 2

我在单元中编写了插入覆盖分区，以便将分区中的所有文件合并为更大的文件，

SQL：

SET hive.exec.compress.output=true;
set hive.merge.smallfiles.avgsize=2560000000;
set hive.merge.mapredfiles=true;
set hive.merge.mapfiles =true;
SET mapreduce.max.split.size=256000000;
SET mapreduce.min.split.size=256000000;
SET mapreduce.output.fileoutputformat.compress.type =BLOCK;
SET hive.hadoop.supports.splittable.combineinputformat=true;
SET mapreduce.output.fileoutputformat.compress.codec=${v_compression_codec};

INSERT OVERWRITE TABLE ${source_database}.${table_name} PARTITION (${line}) \n SELECT ${prepare_sel_columns} \n from ${source_database}.${table_name} \n WHERE ${partition_where_clause};\n"

在上面的设置中，我得到了压缩的输出，但是生成输出文件的时间太长了。

即使它只运行地图作业，也需要很长时间。

寻找任何进一步的设置，从蜂巢一侧，以调整插入运行更快。

计量学.

15 GB文件==>耗时10分钟。

hive

hdfs

hadoop

mapreduce

回答 1

Stack Overflow用户

回答已采纳

发布于 2016-04-05 08:28:53

设置mapreduce.input.fileinputformat.split.minsize=512000000；集mapreduce.input.fileinputformat.split.maxsize=5120000000；集mapreduce.output.fileoutputformat.compress.type =块；设置hive.hadoop.supports.splittable.combineinputformat=true；集mapreduce.output.fileoutputformat.compress.codec=${v_compression_codec}；

上述设置有利于抽签，持续时间由10 min降至1 min。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/36401091

复制

相似问题

问如何调优Hive插入覆盖分区？
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何调优Hive插入覆盖分区？EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何调优Hive插入覆盖分区？
EN