我正在尝试将数据插入到已分区的表中,但并不是所有的分区都已创建(仅创建了空值和零值),请参见下面的内容。
hive>
select state_code,district_code,count(*) from marital_status group by state_code,district_code;
Total MapReduce jobs = 1启动的MapReduce作业:
...
Job 0: Map: 1 Reduce: 1 Cumulative CPU: 3.49 sec HDFS Read: 193305 HDFS Write: 240 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 490 msec
OK
28 000 60
28 532 60
28 533 60
28 534 60
28 535 60
28 536 60
28 537 60
28 538 60
28 539 60
28 540 60
28 541 60
28 542 60
28 543 60
28 544 60
28 545 60
28 546 60
28 547 60
28 548 60
28 549 60
28 550 60
28 551 60
28 552 60
28 553 60
28 554 60
Time taken: 39.442 seconds, Fetched: 24 row(s)我现在将这个表数据插入到district_code上分区的另一个表中。
hive>
insert overwrite table marital_status_part partition(DISTRICT_CODE) SELECT * FROM MARITAL_STATUS WHERE DISTRICT_CODE IN ('532','533','534');
Total MapReduce jobs = 3
Launching Job 1 out of 3reduce任务的数量设置为0,因为没有reduce运算符
Starting Job = job_201507071409_0020, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201507071409_0020
Kill Command = /home/chaitanya/hadoop-1.2.1/libexec/../bin/hadoop job -kill job_201507071409_0020阶段1的Hadoop作业信息:映射器数量: 1;缩减程序数量:
0
2015-07-07 16:35:38,180 Stage-1 map = 0%, reduce = 0%
2015-07-07 16:35:48,214 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.01 sec
2015-07-07 16:35:49,217 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.01 sec
2015-07-07 16:35:50,220 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.01 sec
2015-07-07 16:35:51,222 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.01 sec
2015-07-07 16:35:52,226 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.01 sec
2015-07-07 16:35:53,234 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.01 sec
2015-07-07 16:35:54,237 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 2.01 sec
MapReduce Total cumulative CPU time: 2 seconds 10 msec
Ended Job = job_201507071409_0020
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://localhost:9000/tmp/hive-chaitanya/hive_2015-07-07_16-35-29_099_2560746659196071718-1/-ext-10000
Loading data to table default.marital_status_part partition (district_code=null)
Loading partition {district_code=0}
Partition default.marital_status_part{district_code=0} stats: [num_files: 1, num_rows: 0, total_size: 22882, raw_data_size: 0]
Table default.marital_status_part stats: [num_partitions: 1, num_files: 1, num_rows: 0, total_size: 22882, raw_data_size: 0]
MapReduce Jobs Launched:
Job 0: Map: 1 Cumulative CPU: 2.01 sec HDFS Read: 193305 HDFS Write: 22882 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 10 msec
OK
Time taken: 26.254 seconds实际应该发生的是,必须创建具有532、533、534的三个文件夹,但只创建了两个文件夹(空&零)。你能帮我解决这个问题吗?
发布于 2015-07-07 21:46:50
配置单元分区可以被认为是一个“虚拟”列。在HDFS上,它们被分到不同的目录中。分区值取自select的最后一个条目。在不了解表列的更多信息的情况下,只要稍加修改,下面的查询就可以工作了。
INSERT OVERWRITE TABLE marital_status_part partition(DISTRICT_CODE) SELECT column1, column2, ..., columnN, DISTRICT_CODE FROM MARITAL_STATUS WHERE DISTRICT_CODE IN ('532','533','534');
在此插入中,请注意DISTRICT_CODE是SELECT部分的最后一列。最后一列将用作partition(DISTRICT_CODE)中的DISTRICT_CODE。您需要确保选择的列数与目标表中的列数相匹配,并包含要分区的内容。
详情请参见https://cwiki.apache.org/confluence/display/Hive/Tutorial#Tutorial-Dynamic-PartitionInsert。
发布于 2015-07-08 11:17:14
你执行过下面的命令吗?
设置hive.exec.dynamic.partition=true;
设置hive.exec.dynamic.partition.mode=nonstrict;
这是因为启用了bydefault静态分区,这可能会产生您所面临的问题。
(无法格式化以上文本,因为我正在使用手机回答此问题)
https://stackoverflow.com/questions/31267010
复制相似问题