Apache pig 过滤清管器数据加载两次?
我有以下的猪脚本:Apache pig 过滤清管器数据加载两次?,apache-pig,Apache Pig,我有以下的猪脚本: A = LOAD 'text_a.txt' USING PigStorage(); B = LOAD 'text_b.txt' USING PigStorage(); SOMETHING = FILTER A $0 matches 'SOMETHING'; FOOBAR = FILTER A $0 matches 'FOOBAR'; SOMETHING_B = JOIN SOMETHING BY key, B BY $1; FOOBAR_B = JOIN FOOBAR BY
A = LOAD 'text_a.txt' USING PigStorage();
B = LOAD 'text_b.txt' USING PigStorage();
SOMETHING = FILTER A $0 matches 'SOMETHING';
FOOBAR = FILTER A $0 matches 'FOOBAR';
SOMETHING_B = JOIN SOMETHING BY key, B BY $1;
FOOBAR_B = JOIN FOOBAR BY key, B BY $1;
TEMP = JOIN SOMETHING_B BY key, FOOBAR_B by key;
OUT = FOREACH TEMP GENERATE SOMETHING_B::$1 - FOOBAR_B::$1;
dump OUT;
当这个脚本运行时,看起来A和B中的数据从源读取了两次?有什么方法可以防止它被第二次读取吗?首先,请
在脚本末尾“EXPLAIN OUT”以确定数据是否被读取了两次
查看您的脚本dosent看起来像是A,B被调用了两次您是否尝试使用EXPLAIN命令显示执行计划并查看数据是否真的被读取了两次?现在就开始解释。现在试着弄明白,现在读解释结果