我是新来蜂巢的我遇到了一个问题,
我在蜂巢里有一张桌子,就像这样:
create table td(id int, time string, ip string, v1 bigint, v2 int, v3 int,
v4 int, v5 bigint, v6 int) PARTITIONED BY(dt STRING)
ROW FORMAT DELIMITED FIELDS
TERMINATED BY ',' lines TERMINATED BY '\n' ; 我运行sql,如下所示:
from td
INSERT OVERWRITE DIRECTORY '/tmp/total.out' select count(v1)
INSERT OVERWRITE DIRECTORY '/tmp/totaldistinct.out' select count(distinct v1)
INSERT OVERWRITE DIRECTORY '/tmp/distinctuin.out' select distinct v1
INSERT OVERWRITE DIRECTORY '/tmp/v4.out' select v4 , count(v1), count(distinct v1) group by v4
INSERT OVERWRITE DIRECTORY '/tmp/v3v4.out' select v3, v4 , count(v1), count(distinct v1) group by v3, v4
INSERT OVERWRITE DIRECTORY '/tmp/v426.out' select count(v1), count(distinct v1) where v4=2 or v4=6
INSERT OVERWRITE DIRECTORY '/tmp/v3v426.out' select v3, count(v1), count(distinct v1) where v4=2 or v4=6 group by v3
INSERT OVERWRITE DIRECTORY '/tmp/v415.out' select count(v1), count(distinct v1) where v4=1 or v4=5
INSERT OVERWRITE DIRECTORY '/tmp/v3v415.out' select v3, count(v1), count(distinct v1) where v4=1 or v4=5 group by v3它可以工作,并且输出结果就是我想要的。
但有一个问题,hive生成9个mapreduce作业并逐个运行这些作业。
我对此查询运行explain,得到以下消息:
STAGE DEPENDENCIES:
Stage-9 is a root stage
Stage-0 depends on stages: Stage-9
Stage-10 depends on stages: Stage-9
Stage-1 depends on stages: Stage-10
Stage-11 depends on stages: Stage-9
Stage-2 depends on stages: Stage-11
Stage-12 depends on stages: Stage-9
Stage-3 depends on stages: Stage-12
Stage-13 depends on stages: Stage-9
Stage-4 depends on stages: Stage-13
Stage-14 depends on stages: Stage-9
Stage-5 depends on stages: Stage-14
Stage-15 depends on stages: Stage-9
Stage-6 depends on stages: Stage-15
Stage-16 depends on stages: Stage-9
Stage-7 depends on stages: Stage-16
Stage-17 depends on stages: Stage-9
Stage-8 depends on stages: Stage-17阶段9-17似乎对应于mapreduce作业0-8
但是从上面的解释消息来看,阶段10-17仅依赖于阶段9,
所以我有一个问题,为什么作业1-8不能同时运行?
或者,如何使作业1-8同时运行?
非常感谢您的帮助!
发布于 2012-01-17 12:26:01
在hive-default.xml中,有一个名为"hive.exec.parallel“属性,可用于并行执行作业。默认值为"false“。您可以将其更改为"true“以获得此能力。您可以使用另一个属性"hive.exec.parallel.thread.number“来控制最多可以并行执行多少个作业。
https://stackoverflow.com/questions/8868186
复制相似问题