Apache pig 无法使用PIG存储处理数据

Apache pig 无法使用PIG存储处理数据,apache-pig,Apache Pig,我想要一个用pig脚本处理的文件 我的输入文件如下(&是列delimeter,$是行delimeter): 我试过这个: a = LOAD 'try.txt' USING PigStorage ('$') as (col1:chararray); b = FOREACH a Generate REPLACE(col1, '&', ','); 我试图在第一个delimeter之后分离一个元组,但我使用这个只得到第一个元组 Outputfile我正在查找的内容: (abc,bc,121,

我想要一个用pig脚本处理的文件

我的输入文件如下(
&
是列delimeter,
$
是行delimeter):

我试过这个:

a = LOAD 'try.txt' USING PigStorage ('$') as (col1:chararray); b = FOREACH a Generate REPLACE(col1, '&', ','); 
我试图在第一个delimeter之后分离一个元组,但我使用这个只得到第一个元组

Outputfile我正在查找的内容:

(abc,bc,121,aa)
(aaj,jkj,print,star)
(aa,tss,jjlk,121)
有什么帮助吗?

你能试试这个吗

选项1:

输入

abc&bc&121&aa$aaj&jkj&print&star$aa&tss&jjlk&121
PigScript:

A = LOAD 'input' AS (line:chararray);
B = FOREACH A GENERATE FLATTEN(TOKENIZE(line,'$')) AS splittedLine;
C = FOREACH B GENERATE FLATTEN(STRSPLIT(splittedLine,'&')) AS(col1,col2,col3,col4);
DUMP C;
(abc,bc,121,aa)
(aaj,jkj,print,star)
(aa,tss,jjlk,121)
  A = LOAD 'input' USING PigStorage('$') AS (row1:chararray,row2:chararray,row3:chararray);
  B = FOREACH A GENERATE FLATTEN(TOBAG(STRSPLIT(row1,'&'),STRSPLIT(row2,'&'),STRSPLIT(row3,'&')));
  DUMP B;
(abc,bc,121,aa)
(aaj,jkj,print,star)
(aa,tss,jjlk,121)
输出:

A = LOAD 'input' AS (line:chararray);
B = FOREACH A GENERATE FLATTEN(TOKENIZE(line,'$')) AS splittedLine;
C = FOREACH B GENERATE FLATTEN(STRSPLIT(splittedLine,'&')) AS(col1,col2,col3,col4);
DUMP C;
(abc,bc,121,aa)
(aaj,jkj,print,star)
(aa,tss,jjlk,121)
  A = LOAD 'input' USING PigStorage('$') AS (row1:chararray,row2:chararray,row3:chararray);
  B = FOREACH A GENERATE FLATTEN(TOBAG(STRSPLIT(row1,'&'),STRSPLIT(row2,'&'),STRSPLIT(row3,'&')));
  DUMP B;
(abc,bc,121,aa)
(aaj,jkj,print,star)
(aa,tss,jjlk,121)
选项2:

A = LOAD 'input' AS (line:chararray);
B = FOREACH A GENERATE FLATTEN(TOKENIZE(line,'$')) AS splittedLine;
C = FOREACH B GENERATE FLATTEN(STRSPLIT(splittedLine,'&')) AS(col1,col2,col3,col4);
DUMP C;
(abc,bc,121,aa)
(aaj,jkj,print,star)
(aa,tss,jjlk,121)
  A = LOAD 'input' USING PigStorage('$') AS (row1:chararray,row2:chararray,row3:chararray);
  B = FOREACH A GENERATE FLATTEN(TOBAG(STRSPLIT(row1,'&'),STRSPLIT(row2,'&'),STRSPLIT(row3,'&')));
  DUMP B;
(abc,bc,121,aa)
(aaj,jkj,print,star)
(aa,tss,jjlk,121)
PigScript:

A = LOAD 'input' AS (line:chararray);
B = FOREACH A GENERATE FLATTEN(TOKENIZE(line,'$')) AS splittedLine;
C = FOREACH B GENERATE FLATTEN(STRSPLIT(splittedLine,'&')) AS(col1,col2,col3,col4);
DUMP C;
(abc,bc,121,aa)
(aaj,jkj,print,star)
(aa,tss,jjlk,121)
  A = LOAD 'input' USING PigStorage('$') AS (row1:chararray,row2:chararray,row3:chararray);
  B = FOREACH A GENERATE FLATTEN(TOBAG(STRSPLIT(row1,'&'),STRSPLIT(row2,'&'),STRSPLIT(row3,'&')));
  DUMP B;
(abc,bc,121,aa)
(aaj,jkj,print,star)
(aa,tss,jjlk,121)
输出:

A = LOAD 'input' AS (line:chararray);
B = FOREACH A GENERATE FLATTEN(TOKENIZE(line,'$')) AS splittedLine;
C = FOREACH B GENERATE FLATTEN(STRSPLIT(splittedLine,'&')) AS(col1,col2,col3,col4);
DUMP C;
(abc,bc,121,aa)
(aaj,jkj,print,star)
(aa,tss,jjlk,121)
  A = LOAD 'input' USING PigStorage('$') AS (row1:chararray,row2:chararray,row3:chararray);
  B = FOREACH A GENERATE FLATTEN(TOBAG(STRSPLIT(row1,'&'),STRSPLIT(row2,'&'),STRSPLIT(row3,'&')));
  DUMP B;
(abc,bc,121,aa)
(aaj,jkj,print,star)
(aa,tss,jjlk,121)

嗨,saurav,欢迎来到Stack Overflow。你能为我们提供你已经尝试过的任何东西的例子吗?这可能有助于人们理解您的困境;b=每个a生成替换(col1,,,,,);我试图在第一个delimeter之后分离一个元组,但我使用这个只得到第一个元组。感谢您的帮助:)请将您所有的尝试都放到您的问题中,您可以使用
edit
按钮进行编辑:)