Sorting 清管器:分组依据、平均值和订单依据

Sorting 清管器:分组依据、平均值和订单依据,sorting,group-by,apache-pig,average,Sorting,Group By,Apache Pig,Average,我是pig新手,我有一个文本文件,其中每行包含以下格式的不同信息记录: name, year, count, uniquecount 例如: Zverkov winced_VERB 2004 8 8 Zverkov winced_VERB 2008 4 4 Zverkov winced_VERB 2009 1 1 zvlastni _ADV_ 1913 1 1 zvlastni _ADV_ 1928 2 2 zvlastni _ADV_

我是pig新手,我有一个文本文件,其中每行包含以下格式的不同信息记录:

name, year, count, uniquecount
例如:

Zverkov winced_VERB 2004    8   8
Zverkov winced_VERB 2008    4   4
Zverkov winced_VERB 2009    1   1
zvlastni _ADV_  1913    1   1
zvlastni _ADV_  1928    2   2
zvlastni _ADV_  1929    3   2
我想按其唯一名称对所有记录进行分组,然后针对每个唯一名称计算count/uniquecount,最后根据此计算值对输出进行排序

以下是我一直在尝试的:

bigrams = LOAD 'input/bigram/zv.gz' AS (bigram:chararray, year:int, count:float, books:float);
group_bigrams = GROUP bigrams BY bigram;
average_bigrams = FOREACH group_bigrams GENERATE group, SUM(bigrams.count) / SUM(bigrams.books) AS average;
sorted_bigrams = ORDER average_bigrams BY average;

我的原始代码似乎只做了一个小改动就产生了所需的输出:

bigrams = LOAD 'input/bigram/zv.gz' AS (bigram:chararray, year:int, count:float, books:float);
group_bigrams = GROUP bigrams BY bigram;
average_bigrams = FOREACH group_bigrams GENERATE group, SUM(bigrams.count)/SUM(bigrams.books) AS average;
sorted_bigrams = ORDER average_bigrams BY average DESC, group ASC;

请。共享输入测试数据和所需的输出,以便更好地理解用例。继续尝试,然后将代码和问题放在这里,我们将很乐意提供帮助。我用我一直尝试的代码添加了一个更新。如果您需要其他信息,请告诉我。@MrFlom:Plz共享输入测试数据和相同的预期输出。@muraliao您可以下载输入数据。预期的输出类似于:1。uniquename1,50 2。uniquename2,403。UNIQUENAME35等