我有两张桌子:
Name_SSN和Phone_Address
Name_SSN包含Joe。
Phone_Address Joe999-999-9990日落佛罗里达乔999-9991日落佛罗里达吉姆999 999-9994 Sunny CA Jim 999 999 9994 Sunny CA Bob 999-9999-9999罗利VA
我想加入并得到: Joe
我对猪不熟悉,也不懂.
谢谢你的帮助,
克里斯
发布于 2014-03-27 03:04:33
听起来你想在猪身上做一个内部连接。下面的代码应该对您有帮助:
NameSSNAddr.pig
--Load the two data files
namessn = LOAD 'Name_SSN.csv' USING PigStorage(',') AS (name:chararray, ssn:chararray);
phoneaddr = LOAD 'Phone_Address.csv' USING PigStorage(',') AS (name:chararray, phone:chararray, address:chararray);
--Perform the join of the two datasets on the "name" field
data_join = JOIN namessn BY name, phoneaddr BY name;
--The join combined all fields from both datasets.
--We just want a few fields, so generate them specifically.
data = FOREACH data_join GENERATE namessn::name AS name, namessn::ssn AS ssn, phoneaddr::address AS address;
--You didn't say if you wanted the data distinct or not.
--If you want only one row per distinct user, use this alias.
data_distinct = DISTINCT data;
--Dump all of the aliases so you can see what's in them.
dump namessn;
dump phoneaddr;
dump data;
dump data_distinct;来自dump namessn的输出
(Joe,xxx-xx-xxx1)
(Jim,xxx-xx-xxx2)
(Bob,xxx-xx-xxx3)来自dump phoneaddr的输出
(Joe,999-999-9990,Sunset Florida)
(Joe,999-999-9991,Sunset Florida)
(Joe,999-999-9992,Sunset Florida)
(Jim,999-999-9994,Sunny CA)
(Jim,999-999-9994,Sunny CA)
(Bob,999-999-9999,Raleigh VA)来自dump data的输出
(Bob,xxx-xx-xxx3,Raleigh VA)
(Jim,xxx-xx-xxx2,Sunny CA)
(Jim,xxx-xx-xxx2,Sunny CA)
(Joe,xxx-xx-xxx1,Sunset Florida)
(Joe,xxx-xx-xxx1,Sunset Florida)
(Joe,xxx-xx-xxx1,Sunset Florida)来自dump data_distinct的输出
(Bob,xxx-xx-xxx3,Raleigh VA)
(Jim,xxx-xx-xxx2,Sunny CA)
(Joe,xxx-xx-xxx1,Sunset Florida)https://stackoverflow.com/questions/22673147
复制相似问题