首先,我使用hive将数据从本地文件加载到未分区的表中。
创建原始数据文件:
stephen@stephen-VirtualBox:~/Workspace$ cat> static_source_demo.txt
11,test,2300,admin,c1
12,test2,2220,IT,c2
21,test3,2342,admin,c1
34,test5,2422,admin,c2
35,test6,2411,admin1,c1 创建未分区表
hive> CREATE TABLE employee_source_demo ( eid int, name string,
> salary string, destination string,city string)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ',';
OK
Time taken: 0.153 seconds
hive> 然后,我将数据从文件加载到源表。
将数据加载到源表:
hive> load data local inpath '/home/stephen/Workspace/static_source_demo.txt' into table employee_source_demo;
Loading data to table zipcodes.employee_source_demo
OK
Time taken: 0.773 seconds我确认了数据是否在表中
hive> SELECT * FROM employee_source_demo;
OK
11 test 2300 admin c1
12 test2 2220 IT c2
21 test3 2342 admin c1
34 test5 2422 admin c2
35 test6 2411 admin1 c1
Time taken: 0.228 seconds, Fetched: 5 row(s)
hive> 现在,我在同一个数据库中创建分区表。
hive> CREATE TABLE employee_part1 ( eid int, name String,
> salary String, destination String) PARTITIONED by (city string)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ',';
OK
Time taken: 0.151 seconds
hive> 之后,我尝试将数据插入到新表中,同时考虑到分区。
hive> INSERT INTO TABLE employee_part1 PARTITION (city='c1') SELECT eid, name, salary,
destination FROM employee_source_demo WHERE city='c1';我觉得一切都很顺利。以下是在构建/执行过程中收到的消息
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = stephen_20220114230858_14789266-c13d-4e53-b411-474ac5bcbde7
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1642123763239_0009, Tracking URL = http://stephen-VirtualBox:8088/proxy/application_1642123763239_0009/
Kill Command = /home/stephen/opt/hadoop-2.7.3/bin/hadoop job -kill job_1642123763239_0009
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2022-01-14 23:09:11,139 Stage-1 map = 0%, reduce = 0%
2022-01-14 23:09:22,019 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.23 sec
MapReduce Total cumulative CPU time: 2 seconds 230 msec
Ended Job = job_1642123763239_0009
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://localhost:9000/user/hive/warehouse/zipcodes.db/employee_part1/city=c1/.hive-staging_hive_2022-01-14_23-08-58_343_2569185367107439586-1/-ext-10000
Loading data to table zipcodes.employee_part1 partition (city=c1)
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 2.23 sec HDFS Read: 5027 HDFS Write: 57 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 230 msec
OK
Time taken: 25.388 seconds我不认为有错误。因此,我检查了分区表,如下所示。里面什么都没有。
hive> SELECT * FROM employee_part1;
OK
Time taken: 0.34 seconds
hive> 我还查了蜂房仓库。就像文件在那里,但没有数据
hive> !hadoop fs -ls /user/hive/warehouse/zipcodes.db/employee_part1/city=c1;
Found 1 items
-rwxrwxr-x 1 stephen supergroup 0 2022-01-14 23:09 /user/hive/warehouse/zipcodes.db/employee_part1/city=c1/000000_0
hive> !hadoop fs -cat /user/hive/warehouse/zipcodes.db/employee_part1/city=c1/000000_0;
hive> ,我很想知道这方面的解决方案。我不知道我做错了什么。
发布于 2022-01-16 12:10:29
尝试下面的语句
“'MSCK修理表employee_part1”
https://stackoverflow.com/questions/70717614
复制相似问题