Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/redis/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache pig 如何获取pig中每行的字数?_Apache Pig - Fatal编程技术网

Apache pig 如何获取pig中每行的字数?

Apache pig 如何获取pig中每行的字数?,apache-pig,Apache Pig,我试图计算出pig文件中每行有多少单词。我已经完成了加载和拆分: raw = load file; words = FOREACH raw GENERATE TOKENIZE(*); 这给了我一袋薄纱,每个薄纱上都有一个单词。然后我去计算这些项目,我得到一个错误: counts = FOREACH words GENERATE COUNT(*); org.apache.pig.backend.executionengine.ExecException: ERROR 2106: Error w

我试图计算出pig文件中每行有多少单词。我已经完成了加载和拆分:

raw = load file;
words = FOREACH raw GENERATE TOKENIZE(*);
这给了我一袋薄纱,每个薄纱上都有一个单词。然后我去计算这些项目,我得到一个错误:

counts = FOREACH words GENERATE COUNT(*);
org.apache.pig.backend.executionengine.ExecException: ERROR 2106: Error while computing count in COUNT
...
Caused by: java.lang.NullPointerException
我得到一个错误:

counts = FOREACH words GENERATE COUNT(*);
org.apache.pig.backend.executionengine.ExecException: ERROR 2106: Error while computing count in COUNT
...
Caused by: java.lang.NullPointerException

是不是因为有些队伍有空袋子?或者我还有别的地方做错了吗?

如果是空包的问题,那么你可以尝试以下方法:(未测试)

在这里,我们编写if-else条件来检查标记化的单词是null还是空的,如果是,那么我们将为它指定零,否则总计数。

您可以这样尝试吗

输入

Hi hello how are you
this is apache pig
works

like a charm
Pigscript:

A = LOAD 'input' AS (line:chararray);
B = FOREACH A GENERATE TOKENIZE(line);
C = FOREACH B GENERATE COUNT($0);
DUMP C;
(5)
(4)
(1)
()
(3)
输出:

A = LOAD 'input' AS (line:chararray);
B = FOREACH A GENERATE TOKENIZE(line);
C = FOREACH B GENERATE COUNT($0);
DUMP C;
(5)
(4)
(1)
()
(3)
您不应该像这样使用COUNT(*),这在Pig中是受限的。