Apache pig 如何从pig中的变量中筛选出第一行

Apache pig 如何从pig中的变量中筛选出第一行,apache-pig,Apache Pig,我将cvs文件导入如下变量: basketball_players = load '/usr/data/basketball_players.csv' using PigStorage(','); 以下是前3行的输出: tmp = limit basketball_players 3; dump tmp ("playerID","year","stint","tmID","lgID","GP","GS","minutes","points","oRebounds","dRebounds","

我将
cvs
文件导入如下变量:

basketball_players = load '/usr/data/basketball_players.csv' using PigStorage(',');
以下是前3行的输出:

tmp = limit basketball_players 3;
dump tmp

("playerID","year","stint","tmID","lgID","GP","GS","minutes","points","oRebounds","dRebounds","rebounds","assists","steals","blocks","turnovers","PF","fgAttempted","fgMade","ftAttempted","ftMade","threeAttempted","threeMade","PostGP","PostGS","PostMinutes","PostPoints","PostoRebounds","PostdRebounds","PostRebounds","PostAssists","PostSteals","PostBlocks","PostTurnovers","PostPF","PostfgAttempted","PostfgMade","PostftAttempted","PostftMade","PostthreeAttempted","PostthreeMade","note")
("abramjo01","1946","1","PIT","NBA","47","0","0","527","0","0","0","35","0","0","0","161","834","202","178","123","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0",)
("aubucch01","1946","1","DTF","NBA","30","0","0","65","0","0","0","20","0","0","0","46","91","23","35","19","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0",)
您可以看到第一行是表的标题。我使用下面的命令过滤掉第一行,但它不起作用

grunt> players_raw = filter basketball_players by $1 > 0;
2017-05-06 11:03:36,389 [main] WARN  org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning IMPLICIT_CAST_TO_INT 6 time(s).
当我转储
players\u raw
的值时,它返回空值。如何从变量中筛选出第一行?

用于生成将向数据集添加行号的新列。使用该列筛选第一行

basketball_players = load '/usr/data/basketball_players.csv' using PigStorage(',');
ranked = rank basketball_players;
basketball_players_without_header = Filter ranked by (rank_basketball_players > 1);
DUMP basketball_players_without_header;
另一种方法

basketball_players = load '/usr/data/basketball_players.csv' using PigStorage(',');
basketball_players_without_header = Filter basketball_players by ($0 matches '.*playerID.*');
DUMP basketball_players_without_header;
用于生成将向数据集添加行号的新列。使用该列可筛选第一行

basketball_players = load '/usr/data/basketball_players.csv' using PigStorage(',');
ranked = rank basketball_players;
basketball_players_without_header = Filter ranked by (rank_basketball_players > 1);
DUMP basketball_players_without_header;
另一种方法

basketball_players = load '/usr/data/basketball_players.csv' using PigStorage(',');
basketball_players_without_header = Filter basketball_players by ($0 matches '.*playerID.*');
DUMP basketball_players_without_header;