我正在开放飞行数据集(https://openflights.org/data.html)上进行猪的实验。我目前正在试图映射一个包含所有唯一可能的航班路线的查询,即下表
+---------------+-------------+
| Start_Airport | End_Airport |
+---------------+-------------+
| YYZ | NYC |
| YBG | YVR |
| AEY | GOH |
+---------------+-------------+ 然后将两个值连接到一个主表,其中包含每个机场的经度和纬度。即
+---------+----------+-----------+
| Airport | Latitude | Longitude |
+---------+----------+-----------+
| YYZ | -10.3 | 1.23 |
| YBG | -40.3 | 50.4 |
| AEY | 30.3 | 30.3 |
+---------+----------+-----------+我该怎么做呢?我基本上是想要最后一张桌子
+----------------+----------+-----------+-------------+----------+-----------+
| Start_Airport | Latitude | Longitude | End_Airport | Latitude | Longitude |
+----------------+----------+-----------+-------------+----------+-----------+
| YYZ | -10.3 | 1.23 | NYC | blah | blah |
| YBG | -40.3 | 50.4 | YVR | blah | blah |
| AEY | 30.3 | 30.3 | GOH | blah | blah |
+----------------+----------+-----------+-------------+----------+-----------+我目前正在尝试如下所示,第一个表是c
route_data = JOIN c by (start_airport, end_airport), airports_all by ($0, $0);我认为这本质上是针对查询,根据各自的代码加入starting_aiport和ending_airport,然后遍历各自的经度和纬度,
发布于 2020-03-04 18:29:16
route_data =c by (start_airport,end_airport),airports_all by (0,0);
这类似于sql世界中典型联接查询的"and“条件子句。想象一下下面的查询。会产生你想要的结果吗。从c# t1中选择* airports_all t2 on a.start_airport=b.first_field和a.end_airport=b.first_field;只有当start_airport和end_airport都相同时,才会带来结果。
你想要的东西可以通过以下方式实现:
cat > routes.txt
YYZ,NYC
YBG,YVR
AEY,GOH
cat > airports_all.txt
YYZ,-10.3,1.23
YBG,-40.3,50.4
AEY,30.3,30.3猪编码:
tab1 = load '/home/ec2-user/routes.txt' using PigStorage(',') as (start_airport,end_airport);
describe tab1
tab2 = load '/home/ec2-user/airports_all.txt' using PigStorage(',') as (Airport,Latitude,Longitude);
describe tab2
tab3 = JOIN tab1 by (start_airport), tab2 by (Airport);
describe tab3
tab4 = foreach tab3 generate $0 as start_airport, $3 as start_Latitude, $4 as start_Longitude, $1 as end_airport;
describe tab4
tab5 = JOIN tab4 by (end_airport), tab2 by (Airport);
describe tab5
tab6 = foreach tab5 generate $0 as start_airport, $1 as start_Latitude, $2 as start_Longitude, $3 as end_airport, $5 as end_Latitude, $6 as end_Longitude;
describe tab6
dump tab6https://stackoverflow.com/questions/60499123
复制相似问题