Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/mercurial/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache pig PigLatin:按文件名筛选数据_Apache Pig_Tar - Fatal编程技术网

Apache pig PigLatin:按文件名筛选数据

Apache pig PigLatin:按文件名筛选数据,apache-pig,tar,Apache Pig,Tar,我有一个tar.gz文件,如下所示: myFile.tar.gz |__ a.txt |__ b.txt |__ c.txt S = LOAD '/user/admin/otarie/' USING PigStorage(';','-tagFile'); A = FILTER A BY $0 matches 'a.txt'; B = FILTER A BY $0 matches 'b.txt'; C = FILTER A BY $0 matches 'c.txt'; 我想开发p

我有一个
tar.gz
文件,如下所示:

myFile.tar.gz
  |__ a.txt
  |__ b.txt
  |__ c.txt
S = LOAD '/user/admin/otarie/' USING PigStorage(';','-tagFile'); 
A = FILTER A BY $0 matches 'a.txt';
B = FILTER A BY $0 matches 'b.txt';
C = FILTER A BY $0 matches 'c.txt';
我想开发pig脚本,以不同的方式处理3个子文件。我尝试按文件名进行筛选,如下所示:

myFile.tar.gz
  |__ a.txt
  |__ b.txt
  |__ c.txt
S = LOAD '/user/admin/otarie/' USING PigStorage(';','-tagFile'); 
A = FILTER A BY $0 matches 'a.txt';
B = FILTER A BY $0 matches 'b.txt';
C = FILTER A BY $0 matches 'c.txt';
但是列
$0
包含
myFile.tar.gz
而不是子文件,有没有办法在不解压缩
tar.gz
文件的情况下按子文件名筛选数据?

。它可以识别gzip压缩,但在这之后,它只是从tar文件中读取原始数据,文件信息只是另一行进行处理(甚至不是正确的行;没有终止符,因此它将运行到每个文件的第一行)

您不能以您尝试的方式将tar文件与pig存储一起使用;分别卸载和重新压缩,它们应该可以正常工作