Apache spark dataset 空指针异常-Apache Spark数据集左外部联接
我正在尝试学习spark数据集(spark 2.0.1)。左下方的外部联接正在创建空指针异常Apache spark dataset 空指针异常-Apache Spark数据集左外部联接,apache-spark-dataset,Apache Spark Dataset,我正在尝试学习spark数据集(spark 2.0.1)。左下方的外部联接正在创建空指针异常 case class Employee(name: String, age: Int, departmentId: Int, salary: Double) case class Department(id: Int, depname: String) case class Record(name: String, age: Int, salary: Double, departmentId: Int,
case class Employee(name: String, age: Int, departmentId: Int, salary: Double)
case class Department(id: Int, depname: String)
case class Record(name: String, age: Int, salary: Double, departmentId: Int, departmentName: String)
val employeeDataSet = sc.parallelize(Seq(Employee("Jax", 22, 5, 100000.0),Employee("Max", 22, 1, 100000.0))).toDS()
val departmentDataSet = sc.parallelize(Seq(Department(1, "Engineering"), Department(2, "Marketing"))).toDS()
val averageSalaryDataset = employeeDataset.joinWith(departmentDataSet, $"departmentId" === $"id", "left_outer")
.map(record => Record(record._1.name, record._1.age, record._1.salary, record._1.departmentId , record._2.depname))
averageSalaryDataset.show()
16/12/14 16:48:26错误执行者:第2.0阶段任务0.0中出现异常(TID 12)
java.lang.NullPointerException
这是因为在执行左外部联接时,它为记录提供空值。_2.depname
如何处理?谢谢通过使用---
可以使用if..else条件处理null
val averageSalaryDataset = employeeDataSet.joinWith(departmentDataSet, $"departmentId" === $"id", "left_outer").map(record => Record(record._1.name, record._1.age, record._1.salary, record._1.departmentId , if (record._2 == null) null else record._2.depname ))
连接操作后,生成的数据集列存储为Map(键值对),在Map操作中,我们调用键,但调用record时键为“null”。_2.depName这就是异常的原因
val averageSalaryDataset = employeeDataSet.joinWith(departmentDataSet, $"departmentId" === $"id", "left_outer")
尽管这可能会起作用,但这是一个非常糟糕的解决方案:哦!我不明白为什么join不返回case类的选项,以便于检查。
val averageSalaryDataset = employeeDataSet.joinWith(departmentDataSet, $"departmentId" === $"id", "left_outer")