Google cloud dataflow 如何对云数据流中的自定义逻辑按键分组
我正在尝试在云数据流管线中实现基于自定义对象的Groupby键Google cloud dataflow 如何对云数据流中的自定义逻辑按键分组,google-cloud-dataflow,google-cloud-dataproc,Google Cloud Dataflow,Google Cloud Dataproc,我正在尝试在云数据流管线中实现基于自定义对象的Groupby键 public static void main(String[] args) { Pipeline pipeline = Pipeline.create(PipelineOptionsFactory.create()); List<KV<Student,StudentValues>> studentList = new ArrayList<>(); studentList.ad
public static void main(String[] args) {
Pipeline pipeline = Pipeline.create(PipelineOptionsFactory.create());
List<KV<Student,StudentValues>> studentList = new ArrayList<>();
studentList.add(KV.of(new Student("pawan", 10,"govt"),
new StudentValues("V1", 123,"govt")));
studentList.add(KV.of(new Student("pawan", 13223,"word"),
new StudentValues("V2", 456,"govt")));
PCollection<KV<Student,StudentValues>> pc =
pipeline.apply(Create.of(studentList));
PCollection<KV<Student, Iterable<StudentValues>>> groupedWords =
pc.apply(GroupByKey.<Student,StudentValues>create());
}
我已经重写了自定义类的equals方法,但每次我都要在equals方法中比较同一个Student对象实例。
理想情况下,它应该比较第一个学生键和第二个学生键
我在这里做错了什么。你为什么认为自己做错了什么?每个元素的键都是序列化的(使用指定的AvroCoder),GroupByKey可以将具有相同序列化表示的所有元素分组在一起。之后,不需要比较学生来确保具有相同键的值已分组在一起
@DefaultCoder(AvroCoder.class)
static class Student /*implements Serializable*/{
public Student(){}
public Student(String n, Integer i, String sc){
name = n;
id = i;
school = sc;
}
public String name;
public Integer id;
public String school;
@Override
public boolean equals(Object obj) {
System.out.println("obj = "+obj);
System.out.println("this = "+this);
Student stObj= (Student)obj;
if (stObj.Name== this.Name){
return true;
} else{
return false;
}
}
}