Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/hadoop/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Hadoop 在一个小组里数猪_Hadoop_Apache Pig - Fatal编程技术网

Hadoop 在一个小组里数猪

Hadoop 在一个小组里数猪,hadoop,apache-pig,Hadoop,Apache Pig,假设我有一个关系学生,字段为成绩和老师。我想按年级和老师分组,但保留每组每个年级的所有学生人数。比如: classes = GROUP Students BY (grade,teacher); classes = FOREACH classes { GENERATE (### COUNT OF ALL STUDENTS IN GRADE ###) as grade_size, Students as students, teacher as teache

假设我有一个关系
学生
,字段为
成绩
老师
。我想按年级和老师分组,但保留每组每个年级的所有学生人数。比如:

classes = GROUP Students BY (grade,teacher);
classes = FOREACH classes {
   GENERATE
      (### COUNT OF ALL STUDENTS IN GRADE ###) as grade_size,
      Students as students,
      teacher as teacher;
}

但是我不知道如何从group语句内部进行过滤。某种过滤器,但我不知道如何界定小组内外学生的分数

有两种方法:

1) 使用按年级和老师分组,比计数、比展平和按年级分组

classes = GROUP Students BY (grade,teacher);
teachers = FOREACH classes GENEARATE FLATTEN(group) as (grade,teacher), COUNT(Students) as perTeacehr;
grade = GROUP teachers BY grade;
result = FOREACH grade GENERATE FLATTEN(teachers), SUM(teachers.perTeacher) as perGrade;
describe result;
dump result;

2) 按级别分组,而不是使用DataFu库中的UDF在内存中执行分组,但这很容易受到堆内存异常的影响,但速度更快。

删除了sql标记,因为这是关于Pig的。示例输入和输出将有助于理解您的问题。