Hadoop 猪的字数

Hadoop 猪的字数,hadoop,hive,apache-pig,Hadoop,Hive,Apache Pig,假设我有一个文本文件名count.txt,其中包含下面提到的段落 I am working in hadoop along with various courses like Hadoop, Hana, Java etc I love working with hadoop This is hadoop project 现在我需要知道hadoop这个词在上面的文件中出现了多少次 下面的代码是我尝试过的 c1= load '/...../count.txt'

假设我有一个文本文件名count.txt,其中包含下面提到的段落

    I am working  in hadoop along with  various courses like Hadoop, Hana, Java etc
    I love working with hadoop
    This is hadoop project 
现在我需要知道hadoop这个词在上面的文件中出现了多少次

下面的代码是我尝试过的

    c1= load '/...../count.txt' using PigStorage(',') as (Name:chararray);
    c2 = foreach c1  generate FLATTEN(TOKENIZE(LOWER(Name)))as (Name1:chararray);
    dump c2;
    c3 = filter c2 by Name1=='hadoop';
    dump c3;
这是我得到的输出

(hadoop)
(hadoop)
(hadoop)
(hadoop)
我需要的是数字4,而不是hadoop这个词重复了4次。因此,我试图执行

`c4 = foreach c3 generate COUNT($0);`
和得到错误..请帮助我,可能是一件简单的事情,我找不到。 提前感谢。

试试这个:

只需做一组c2:

c3 = filter c2 by Name1=='hadoop'
grouped = GROUP c3 BY Name1;
wordcount = FOREACH grouped GENERATE $0, COUNT($1);
DUMP wordcount

如果有帮助,请告诉我。

@sudarshan因为答案有帮助,请也投票。谢谢