Apache pig 过滤清管器数据加载两次?

Apache pig 过滤清管器数据加载两次?,apache-pig,Apache Pig,我有以下的猪脚本: A = LOAD 'text_a.txt' USING PigStorage(); B = LOAD 'text_b.txt' USING PigStorage(); SOMETHING = FILTER A $0 matches 'SOMETHING'; FOOBAR = FILTER A $0 matches 'FOOBAR'; SOMETHING_B = JOIN SOMETHING BY key, B BY $1; FOOBAR_B = JOIN FOOBAR BY

我有以下的猪脚本:

A = LOAD 'text_a.txt' USING PigStorage();
B = LOAD 'text_b.txt' USING PigStorage();
SOMETHING = FILTER A $0 matches 'SOMETHING';
FOOBAR = FILTER A $0 matches 'FOOBAR';

SOMETHING_B = JOIN SOMETHING BY key, B BY $1;
FOOBAR_B = JOIN FOOBAR BY key, B BY $1;
TEMP = JOIN SOMETHING_B BY key, FOOBAR_B by key;
OUT = FOREACH TEMP GENERATE SOMETHING_B::$1 - FOOBAR_B::$1; 
dump OUT;
当这个脚本运行时,看起来A和B中的数据从源读取了两次?有什么方法可以防止它被第二次读取吗?

首先,请 在脚本末尾“EXPLAIN OUT”以确定数据是否被读取了两次


查看您的脚本dosent看起来像是A,B被调用了两次

您是否尝试使用EXPLAIN命令显示执行计划并查看数据是否真的被读取了两次?现在就开始解释。现在试着弄明白,现在读解释结果