Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/rust/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark dataset 空指针异常-Apache Spark数据集左外部联接_Apache Spark Dataset - Fatal编程技术网

Apache spark dataset 空指针异常-Apache Spark数据集左外部联接

Apache spark dataset 空指针异常-Apache Spark数据集左外部联接,apache-spark-dataset,Apache Spark Dataset,我正在尝试学习spark数据集(spark 2.0.1)。左下方的外部联接正在创建空指针异常 case class Employee(name: String, age: Int, departmentId: Int, salary: Double) case class Department(id: Int, depname: String) case class Record(name: String, age: Int, salary: Double, departmentId: Int,

我正在尝试学习spark数据集(spark 2.0.1)。左下方的外部联接正在创建空指针异常

case class Employee(name: String, age: Int, departmentId: Int, salary: Double)
case class Department(id: Int, depname: String)
case class Record(name: String, age: Int, salary: Double, departmentId: Int, departmentName: String)
val employeeDataSet = sc.parallelize(Seq(Employee("Jax", 22, 5, 100000.0),Employee("Max", 22, 1, 100000.0))).toDS()
val departmentDataSet = sc.parallelize(Seq(Department(1, "Engineering"), Department(2, "Marketing"))).toDS()

val averageSalaryDataset = employeeDataset.joinWith(departmentDataSet, $"departmentId" === $"id", "left_outer")
                               .map(record => Record(record._1.name, record._1.age, record._1.salary, record._1.departmentId , record._2.depname))

averageSalaryDataset.show()
16/12/14 16:48:26错误执行者:第2.0阶段任务0.0中出现异常(TID 12) java.lang.NullPointerException

这是因为在执行左外部联接时,它为记录提供空值。_2.depname

如何处理?谢谢

通过使用---


可以使用if..else条件处理null

val averageSalaryDataset = employeeDataSet.joinWith(departmentDataSet,   $"departmentId" === $"id", "left_outer").map(record => Record(record._1.name, record._1.age, record._1.salary, record._1.departmentId , if (record._2 == null) null else record._2.depname ))
连接操作后,生成的数据集列存储为Map(键值对),在Map操作中,我们调用键,但调用record时键为“null”。_2.depName这就是异常的原因

val averageSalaryDataset = employeeDataSet.joinWith(departmentDataSet,   $"departmentId" === $"id", "left_outer")

尽管这可能会起作用,但这是一个非常糟糕的解决方案:哦!我不明白为什么join不返回case类的选项,以便于检查。
val averageSalaryDataset = employeeDataSet.joinWith(departmentDataSet,   $"departmentId" === $"id", "left_outer")