Apache spark 如何解决spark数据帧中的java.lang.NullPointerException';什么是手术?
我有两个包含用户ID的数据帧。我想取这些数据帧的差异,所以使用了Apache spark 如何解决spark数据帧中的java.lang.NullPointerException';什么是手术?,apache-spark,dataframe,spark-dataframe,Apache Spark,Dataframe,Spark Dataframe,我有两个包含用户ID的数据帧。我想取这些数据帧的差异,所以使用了,除了,如下所示: df1.except(df2); 但出现以下错误: java.lang.NullPointerException at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.agg_doAggregateWithKeys$(Unknown Source) at org.apache.s
,除了,如下所示:
df1.except(df2);
但出现以下错误:
java.lang.NullPointerException
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.agg_doAggregateWithKeys$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:231)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:225)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:826)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:826)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
我不明白问题在哪里
我试图用空值过滤掉两个数据帧
编辑:两个数据帧的架构和示例数据:
模式:
df1.printSchema-
root
|-- uid: string (nullable = true)
df2.printSchema
root
|-- uid: string (nullable = true)
df1的数据:
+--------------------+
| uid|
+--------------------+
| sss12|
| ushadevi_8512|
| babu57111|
| gianchand-199|
| rju-815423|
df2的数据:
+--------------------+
| uid|
+--------------------+
| navratn-3131|
| jaykumar-1|
| vishwanath-666|
| dharmendra-5623|
请发布两个DARAFRAME的样本数据。除外
不是一个动作。请再添加一些代码来创建一个“@philantrovert”,我已经使用了show()操作。