Apache spark 转换Spark数据集<;世界其他地区>;到Java Pojo类

Apache spark 转换Spark数据集<;世界其他地区>;到Java Pojo类,apache-spark,java-8,apache-spark-sql,Apache Spark,Java 8,Apache Spark Sql,我正在尝试将数据集转换为java对象。 模式类似于 root |-- deptId: long (nullable = true) |-- depNameName: string (nullable = true) |-- employee: array (nullable = true) | |-- element: struct (containsNull = true) | | |-- firstName: string (nullable = true) |

我正在尝试将数据集转换为java对象。 模式类似于

root
 |-- deptId: long (nullable = true)
 |-- depNameName: string (nullable = true)
 |-- employee: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- firstName: string (nullable = true)
 |    |    |-- lastName: string (nullable = true)
 |    |    |-- phno: Long (nullable = true)
 |    |    |    |-- element: integer (containsNull = true)
我创建了pojo类,如

class Department {
  private Long deptId;
  private String depName;
  private List<Employee> employess;
  //with getter setters and no argument constructor
  }



class Employee {
  private String firstName;
  private String lastName;
  private List<Long> phno;
  //With getter setter and no argument constructor 
 }
通过这种方法我获得了成功。但它包含了大量的模式名称和所有内容的硬编码。因此,寻找一个更优雅的解决方案


请建议此问题的最佳解决方案。

请在此处查看公认的答案:。我知道这个问题是关于scala的,但公认的答案实际上是Java中的。如果没有嵌套对象,使用编码器的方法是有效的。如下图所示@GlennieHellesSindholtAny您采用的解决方案的参考资料?
  Dataset<Row> ds = this.spark.read().parquet(Parquet file path);
  Dataset<Department> departmentDataset = 
  ds.as(Encoders.bean(Department.class));
  JavaRDD<String> rdd = 

departmentDataset.toJavaRDD().map((Function<Department, String>) v -> {

            StringBuilder sb = new StringBuilder();
            sb.append("deptId").append(v.getDeptID());
            if(!CollectionUtil.isListNullOrEmpty(v.employee))

   sb.append("FirstName").append(v.getEmployee().get(0).getName);

   if(!CollectionUtil.isListNullOrEmpty(v.getEmployee().getPhno()))
            sb.append("Ph 
    number").append(v.getEmployee().getPhno().get(0));

            return sb.toString();
        });
public Department(Row row)
 {
  this.employees  = new ArrayList<Employee>
  this.deptaID  = (Long)row.getAs("deptId");
  List rowList = (List)row.getList(row.fieldIndex("employee"));
    if (rowList!=null) {
      for (Row r : rowList) {
        Employee obj = new Employee(r);
        employees.add(obj);
      }
    }


 public Employee(Row row)
 {
 this.phno  = new ArrayList<Long>
 this.firstName  = (Long)row.getAs("firstName");
  List rowList = (List)row.getList(row.fieldIndex("phno"));
    if (rowList!=null) {
      for (Row r : rowList) {          
        phno.add(r);
      }
    }

 JavaRDD<Department> rdd =  ds.toJavaRDD().map(Department::new);
 JavaRDD<String> rdd     = rdd.map((Function<Department, String>) v -> {

                StringBuilder sb = new StringBuilder();
                sb.append("deptId").append(v.getDeptID());
                if(!CollectionUtil.isListNullOrEmpty(v.employee))

sb.append("FirstName").append(v.getEmployee().get(0).getName);

if(!CollectionUtil.isListNullOrEmpty(v.getEmployee().getPhno()))
                sb.append("Ph 
number").append(v.getEmployee().getPhno().get(0));

                return sb.toString();
            });