Apache spark 转换Spark数据集<;世界其他地区>;到Java Pojo类
我正在尝试将数据集转换为java对象。 模式类似于Apache spark 转换Spark数据集<;世界其他地区>;到Java Pojo类,apache-spark,java-8,apache-spark-sql,Apache Spark,Java 8,Apache Spark Sql,我正在尝试将数据集转换为java对象。 模式类似于 root |-- deptId: long (nullable = true) |-- depNameName: string (nullable = true) |-- employee: array (nullable = true) | |-- element: struct (containsNull = true) | | |-- firstName: string (nullable = true) |
root
|-- deptId: long (nullable = true)
|-- depNameName: string (nullable = true)
|-- employee: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- firstName: string (nullable = true)
| | |-- lastName: string (nullable = true)
| | |-- phno: Long (nullable = true)
| | | |-- element: integer (containsNull = true)
我创建了pojo类,如
class Department {
private Long deptId;
private String depName;
private List<Employee> employess;
//with getter setters and no argument constructor
}
class Employee {
private String firstName;
private String lastName;
private List<Long> phno;
//With getter setter and no argument constructor
}
通过这种方法我获得了成功。但它包含了大量的模式名称和所有内容的硬编码。因此,寻找一个更优雅的解决方案
请建议此问题的最佳解决方案。请在此处查看公认的答案:。我知道这个问题是关于scala的,但公认的答案实际上是Java中的。如果没有嵌套对象,使用编码器的方法是有效的。如下图所示@GlennieHellesSindholtAny您采用的解决方案的参考资料?
Dataset<Row> ds = this.spark.read().parquet(Parquet file path);
Dataset<Department> departmentDataset =
ds.as(Encoders.bean(Department.class));
JavaRDD<String> rdd =
departmentDataset.toJavaRDD().map((Function<Department, String>) v -> {
StringBuilder sb = new StringBuilder();
sb.append("deptId").append(v.getDeptID());
if(!CollectionUtil.isListNullOrEmpty(v.employee))
sb.append("FirstName").append(v.getEmployee().get(0).getName);
if(!CollectionUtil.isListNullOrEmpty(v.getEmployee().getPhno()))
sb.append("Ph
number").append(v.getEmployee().getPhno().get(0));
return sb.toString();
});
public Department(Row row)
{
this.employees = new ArrayList<Employee>
this.deptaID = (Long)row.getAs("deptId");
List rowList = (List)row.getList(row.fieldIndex("employee"));
if (rowList!=null) {
for (Row r : rowList) {
Employee obj = new Employee(r);
employees.add(obj);
}
}
public Employee(Row row)
{
this.phno = new ArrayList<Long>
this.firstName = (Long)row.getAs("firstName");
List rowList = (List)row.getList(row.fieldIndex("phno"));
if (rowList!=null) {
for (Row r : rowList) {
phno.add(r);
}
}
JavaRDD<Department> rdd = ds.toJavaRDD().map(Department::new);
JavaRDD<String> rdd = rdd.map((Function<Department, String>) v -> {
StringBuilder sb = new StringBuilder();
sb.append("deptId").append(v.getDeptID());
if(!CollectionUtil.isListNullOrEmpty(v.employee))
sb.append("FirstName").append(v.getEmployee().get(0).getName);
if(!CollectionUtil.isListNullOrEmpty(v.getEmployee().getPhno()))
sb.append("Ph
number").append(v.getEmployee().getPhno().get(0));
return sb.toString();
});