文件empdetails.log有以下数据-
100 AAA 12000 HYD
101 BBB 13000 PUNE
102CCC 14000路政署
103DDD 10000 BLORE
104 EEE 12000 PUNE
我希望将这些数据加载到具有动态分区的“Emp”表中,以便选择* from;给出以下输出(按位置进行分区)。
100 AAA 12000 HYD
102CCC 14000路政署
101 BBB 13000 PUNE
104 EEE 12000 PUNE
103DDD 10000 BLORE
有人能提供要在蜂巢中执行的load命令吗?
表创建-创建表Emp (cid int,cname string,csal int)被(cloc string)行格式分隔的字段以“\t”作为文本文件结束;
发布于 2016-12-26 20:12:54
对于动态分区,您必须使用插入..。选择查询(蜂巢插入)。
将数据插入到具有DP的Hive表中是一个两个步骤。
另外,在Hive中设置以下属性。
下面的示例适用于cloudera。
-- Extract orders data from mysql (Retail_DB.products)
select * from orders into outfile '/tmp/orders_data.psv' fieldsterminated by '|' lines terminated by 'n';
-- Create Hive table with DP - order_month is DP.
CREATE TABLE orders (order_id int, order_date string, order_customer_id int, order_status string ) PARTITIONED BY (order_month string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'STORED AS TEXTFILE;
--Create staging table in Hive.
CREATE TABLE orders_stage (order_id int,order_date string, order_customer_id int, order_status string ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE;
--Load data into staging table (Hive)
Load data into staging table load data local inpath
/tmp/orders_data.psv' overwrite into table orders_stage;
--Insert into Orders, which is final table (Hive).
Insert overwrite table retail_ods.orders partition (order_month)
select order_id, order_date, order_customer_id,order_status,
substr(order_date, 1, 7) order_month from retail_stage.orders_stage;您可以在https://cwiki.apache.org/confluence/display/Hive/DynamicPartitions上找到更多详细信息
https://stackoverflow.com/questions/41328208
复制相似问题