我在变量data_1中有一个数据集
describe data_1;
output:
group_2: {group: (age: int,phone: chararray),group_1: {(group: (age: int,phone: chararray,id: int),student_details: {(id: int,firstname: chararray,lastname: chararray,age: int,phone: chararray,city: chararray)})}}和
DUMP data_1;
output:
(21,9848022330) {((21,9848022330,4),{(4,Preethi,Agarwal,21,9848022330,London)})}
(21,9848022337) {((21,9848022337,1),{(1,Rajiv,Reddy,21,9848022337,Paris)})}
(22,9848022338) {((22,9848022338,2),{(2,siddarth,Battacharya,22,9848022338,Kolkata)})}
(22,9848022339) {((22,9848022339,3),{(3,Rajesh,Khanna,22,9848022339,Delhi)})}
(23,9848022335) {((23,9848022335,6),{(6,Archana,Mishra,23,9848022335,Chennai)})}
(23,9848022336) {((23,9848022336,5),{(5,Trupthi,Mohanthy,23,9848022336,Bhuwaneshwar)})}
(24,9848022333) {((24,9848022333,7),{(7,Komal,Nayak,24,9848022333,trivendram)}),((24,9848022333,8),{(8,Bharathi,Nambiayar,24,9848022333,Chennai)})}
(111,9834534343) {((111,9834534343,9),{(9,ABC,DEF,111,9834534343,Delhi1),(9,ABC,DEF,111,9834534343,Delhi2),(9,ABC,DEF,111,9834534343,Delhi3)})}我想移除额外的bag.tuple &只使用$1.1的包。
我试图通过使用像group_2_normal = FOREACH data_1 GENERATE $0.age,$0.phone,$1.$1;这样的东西来实现这一点,但是我仍然无法移除围绕$1.1美元包的额外的包和元组。
上面的foreach命令的输出是:
21 9848022330 {({(4,Preethi,Agarwal,21,9848022330,London)})}
21 9848022337 {({(1,Rajiv,Reddy,21,9848022337,Paris)})}但期望的产出是:
21 9848022330 {(4,Preethi,Agarwal,21,9848022330,London)}
21 9848022337 {(1,Rajiv,Reddy,21,9848022337,Paris)}发布于 2016-03-25 07:05:23
我认为用扁平化会对你有帮助。只要你包里只有一排,它就会给你你想要的东西。
group_2_normal = FOREACH data_1 GENERATE $0.age,$0.phone,FLATTEN($1.$1);https://stackoverflow.com/questions/36177643
复制相似问题