Apache pig 无法从Pig脚本中的包中读取数据/

Apache pig 无法从Pig脚本中的包中读取数据/,apache-pig,Apache Pig,您能告诉我,如果我有{,',)分隔字段和行李数据,如何读取。我得到下面的错误 Input Data. Jorge Posada Yankees|{(Catcher),(Designated_hitter)}|[games#1594,hit_by_pitch#65,grand_slams#7] Landon Powell Oakland|{(Catcher),(First_baseman)}|[on_base_percentage#0.297,games#26,home_runs#7] Marti

您能告诉我,如果我有{,',)分隔字段和行李数据,如何读取。我得到下面的错误

Input Data.
Jorge Posada Yankees|{(Catcher),(Designated_hitter)}|[games#1594,hit_by_pitch#65,grand_slams#7]
Landon Powell Oakland|{(Catcher),(First_baseman)}|[on_base_percentage#0.297,games#26,home_runs#7]
Martin Prado Atlanta|{(Second_baseman),(Infielder),(Left_fielder)},[games#258,hit_by_pitch#3]

bfile= LOAD '/home/cloudera/basketball.txt' using PigStorage('|')as(name:chararray,team:chararray,pos:bag{t:(p:chararray)},bat:map[]);

grunt> players = load 'basketball.txt' using PigStorage('|')as (name:chararray, team:chararray,position:bag{t:(p:chararray)}, bat:map[]);
2014-11-13 04:49:48,144 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: <line 27, column 117>  mismatched input ';' expecting RIGHT_PAREN
Details at logfile: /home/cloudera/pig_1415835089181.log
输入数据。
豪尔赫·波萨达洋基队{(接球手)(指定击球手)}{1594场比赛,投球命中65次,大满贯7次]
兰登·鲍威尔·奥克兰{(接球手),(一垒手)}{[上垒率0.297,比赛26,本垒打7]
马丁·普拉多亚特兰大{(二垒手),(内野手),(左外野手)},[第258场比赛,投球命中率3]
bfile=LOAD'/home/cloudera/basketball.txt',使用PigStorage(“|”)作为(名称:chararray,团队:chararray,位置:bag{t:(p:chararray)},bat:map[]);
grunt>players=使用PigStorage(“|”)加载'basketball.txt'(名称:chararray,团队:chararray,位置:bag{t:(p:chararray)},bat:map[]);
2014-11-13 04:49:48144[main]ERROR org.apache.pig.tools.grunt.grunt-错误1200:输入不匹配“;”应为正确参数
详细信息见日志文件:/home/cloudera/pig1415835089181.log

Sanjeeb

对于上述输入,不需要regex,您可以使用现有模式本身访问所有值

input.txt

Jorge Posada |Yankees|{(Catcher),(Designated_hitter)}|[games#1594,hit_by_pitch#65,grand_slams#7]
Landon Powell |Oakland|{(Catcher),(First_baseman)}|[on_base_percentage#0.297,games#26,home_runs#7]
Martin Prado |Atlanta|{(Second_baseman),(Infielder),(Left_fielder)}|[games#258,hit_by_pitch#3]
Pigscript:

bfile= LOAD 'input.txt' using PigStorage('|') as (name:chararray,team:chararray,pos:bag{t:(p:chararray)},bat:map[]);

--Print the name and team
B = FOREACH bfile GENERATE name,team;
--DUMP B;

--Print the player and his position
C = FOREACH bfile GENERATE name,pos.(p);
--DUMP C;

--Print the player and  key/value of games and hit_by_pitch
D = FOREACH bfile GENERATE name,bat#'games',bat#'hit_by_pitch';
--DUMP D;
(Jorge Posada ,Yankees)
(Landon Powell ,Oakland)
(Martin Prado ,Atlanta)
(Jorge Posada ,{(Catcher),(Designated_hitter)})
(Landon Powell ,{(Catcher),(First_baseman)})
(Martin Prado ,{(Second_baseman),(Infielder),(Left_fielder)})
(Jorge Posada ,1594,65)
(Landon Powell ,26,)
(Martin Prado ,258,3)
转储B的输出:

bfile= LOAD 'input.txt' using PigStorage('|') as (name:chararray,team:chararray,pos:bag{t:(p:chararray)},bat:map[]);

--Print the name and team
B = FOREACH bfile GENERATE name,team;
--DUMP B;

--Print the player and his position
C = FOREACH bfile GENERATE name,pos.(p);
--DUMP C;

--Print the player and  key/value of games and hit_by_pitch
D = FOREACH bfile GENERATE name,bat#'games',bat#'hit_by_pitch';
--DUMP D;
(Jorge Posada ,Yankees)
(Landon Powell ,Oakland)
(Martin Prado ,Atlanta)
(Jorge Posada ,{(Catcher),(Designated_hitter)})
(Landon Powell ,{(Catcher),(First_baseman)})
(Martin Prado ,{(Second_baseman),(Infielder),(Left_fielder)})
(Jorge Posada ,1594,65)
(Landon Powell ,26,)
(Martin Prado ,258,3)
转储C的输出:

bfile= LOAD 'input.txt' using PigStorage('|') as (name:chararray,team:chararray,pos:bag{t:(p:chararray)},bat:map[]);

--Print the name and team
B = FOREACH bfile GENERATE name,team;
--DUMP B;

--Print the player and his position
C = FOREACH bfile GENERATE name,pos.(p);
--DUMP C;

--Print the player and  key/value of games and hit_by_pitch
D = FOREACH bfile GENERATE name,bat#'games',bat#'hit_by_pitch';
--DUMP D;
(Jorge Posada ,Yankees)
(Landon Powell ,Oakland)
(Martin Prado ,Atlanta)
(Jorge Posada ,{(Catcher),(Designated_hitter)})
(Landon Powell ,{(Catcher),(First_baseman)})
(Martin Prado ,{(Second_baseman),(Infielder),(Left_fielder)})
(Jorge Posada ,1594,65)
(Landon Powell ,26,)
(Martin Prado ,258,3)
转储D的输出:

bfile= LOAD 'input.txt' using PigStorage('|') as (name:chararray,team:chararray,pos:bag{t:(p:chararray)},bat:map[]);

--Print the name and team
B = FOREACH bfile GENERATE name,team;
--DUMP B;

--Print the player and his position
C = FOREACH bfile GENERATE name,pos.(p);
--DUMP C;

--Print the player and  key/value of games and hit_by_pitch
D = FOREACH bfile GENERATE name,bat#'games',bat#'hit_by_pitch';
--DUMP D;
(Jorge Posada ,Yankees)
(Landon Powell ,Oakland)
(Martin Prado ,Atlanta)
(Jorge Posada ,{(Catcher),(Designated_hitter)})
(Landon Powell ,{(Catcher),(First_baseman)})
(Martin Prado ,{(Second_baseman),(Infielder),(Left_fielder)})
(Jorge Posada ,1594,65)
(Landon Powell ,26,)
(Martin Prado ,258,3)
在包中,如果需要多个字段,那么像这样声明和访问

pos:bag{t:(p:chararray,q:charrarray)}
FOREACH bfile GENERATE name,pos.(p,q);

我没有看到名字和球队(即豪尔赫·波萨达·扬基)之间的分隔符“|”,它是一个空格,这是应该的吗?在给定的输入中,第三行是“,”作为位置和击球之间的分隔符。这是应该的吗?嘿,我已经更新了数据。很抱歉。bfile=LOAD'/home/cloudera/basketball.txt”使用PigStorage(“|”)作为(名称:查拉雷,团队:查拉雷,团队:查拉雷,pos:袋:{t:(p:查拉雷)},bat:地图[));乔治·波萨达(美国洋基队){{(捕手,(指定的打击者)}124;[队队:队:队:队:查拉雷,队:队:队:队:队:队:队,队:队:队:队,队:队,队,队,队,队,队,队,队,队,队,队,队,队,队,队,队,队,队,队,队,队,队,队,队,队,队,队,队,队,队,队,队,队,队,队,队,队,队,队,队,队,队,队:队,队,队:队:队:队:队:队:队:队:队:队:队:队:队:队:队:队:队:队:队:队:队:棒球手,(内野手,(左外野手)}|[游戏258,投球命中率3]我能够成功加载和访问上述数据。错误似乎出现在脚本的第27行。错误1200:。您可以粘贴pig脚本的第27行吗?我只使用了一个linie来读取文件。我不知道为什么它指向第27行。您能用一些示例帮助我理解正则表达式操作或用法吗。我在论坛上看到你写了很多关于这方面的文章。请用一些示例来解释我。谢谢Siva。我知道这个正则表达式是不需要的。但是我只想在我的POC中使用一些数据。但是我真的不理解这个概念。你能在这里详细说明一下吗。请。非常感谢Siva。它工作得很好…但是请解释一下关于正则表达式。Cool.请将此问题标记为已回答。确定您想从regex获得什么?您能告诉我您到底需要什么吗?siva在哪里做这件事..请让我知道…关于regex摘录,您有任何文档或链接来了解基本知识..我对此一无所知..单击回答部分左侧的勾号按钮。对于regex,我通常使用nd来模拟正则表达式