Hadoop Pig-根据主表映射和检索两列?

Hadoop Pig-根据主表映射和检索两列?,hadoop,apache-pig,Hadoop,Apache Pig,我正在openflights数据集()上试验pig。我目前正在尝试映射一个包含所有唯一可能航班路线的查询,即下表 +---------------+-------------+ | Start_Airport | End_Airport | +---------------+-------------+ | YYZ | NYC | | YBG | YVR | | AEY | GOH | +

我正在openflights数据集()上试验pig。我目前正在尝试映射一个包含所有唯一可能航班路线的查询,即下表

+---------------+-------------+
| Start_Airport | End_Airport |
+---------------+-------------+
| YYZ           | NYC         |
| YBG           | YVR         |
| AEY           | GOH         |
+---------------+-------------+ 
然后将这两个值与包含每个机场的经度和纬度的主表连接起来。i、 e

+---------+----------+-----------+
| Airport | Latitude | Longitude |
+---------+----------+-----------+
| YYZ     |    -10.3 |      1.23 |
| YBG     |    -40.3 |      50.4 |
| AEY     |     30.3 |      30.3 |
+---------+----------+-----------+
我该怎么做呢?我基本上是想得到一张最终的桌子

+----------------+----------+-----------+-------------+----------+-----------+
| Start_Airport  | Latitude | Longitude | End_Airport | Latitude | Longitude |
+----------------+----------+-----------+-------------+----------+-----------+
| YYZ            |    -10.3 |      1.23 | NYC         | blah     | blah      |
| YBG            |    -40.3 |      50.4 | YVR         | blah     | blah      |
| AEY            |     30.3 |      30.3 | GOH         | blah     | blah      |
+----------------+----------+-----------+-------------+----------+-----------+
我目前正试图做如下工作,c是第一个表

route_data = JOIN c by (start_airport, end_airport), airports_all by ($0, $0);
我认为这本质上是说,对于查询,根据相应的代码连接起始机场和结束机场,然后通过相应的经度和纬度,

route\u data=join c by(起始机场,结束机场),airports\u all by($0,$0)

这类似于sql世界中典型连接查询的“and”条件子句。想象一下下面的查询。它会产生你想要的结果吗。 在a.start\u airport=b.first\u字段和a.end\u airport=b.first\u字段上选择*从c t1加入airports\u all t2;只有当start_机场和end_机场相同时,才会产生结果

您的愿望可以通过以下方式实现:

cat > routes.txt
YYZ,NYC
YBG,YVR
AEY,GOH

cat > airports_all.txt
YYZ,-10.3,1.23
YBG,-40.3,50.4
AEY,30.3,30.3
清管器代码:

tab1 = load '/home/ec2-user/routes.txt' using PigStorage(',') as (start_airport,end_airport);
describe tab1
tab2 = load '/home/ec2-user/airports_all.txt' using PigStorage(',') as (Airport,Latitude,Longitude);
describe tab2
tab3 = JOIN tab1 by (start_airport), tab2 by (Airport);
describe tab3
tab4 = foreach tab3 generate $0 as start_airport, $3 as start_Latitude, $4 as start_Longitude, $1 as end_airport;
describe tab4
tab5 = JOIN tab4 by (end_airport), tab2 by (Airport);
describe tab5
tab6 = foreach tab5 generate $0 as start_airport, $1 as start_Latitude, $2 as start_Longitude, $3 as end_airport, $5 as end_Latitude, $6 as end_Longitude;
describe tab6
dump tab6

请出示完整的猪脚本