Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark 如何解决spark数据帧中的java.lang.NullPointerException';什么是手术?_Apache Spark_Dataframe_Spark Dataframe - Fatal编程技术网

Apache spark 如何解决spark数据帧中的java.lang.NullPointerException';什么是手术?

Apache spark 如何解决spark数据帧中的java.lang.NullPointerException';什么是手术?,apache-spark,dataframe,spark-dataframe,Apache Spark,Dataframe,Spark Dataframe,我有两个包含用户ID的数据帧。我想取这些数据帧的差异,所以使用了,除了,如下所示: df1.except(df2); 但出现以下错误: java.lang.NullPointerException at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.agg_doAggregateWithKeys$(Unknown Source) at org.apache.s

我有两个包含用户ID的数据帧。我想取这些数据帧的差异,所以使用了
,除了
,如下所示:

df1.except(df2);
但出现以下错误:

java.lang.NullPointerException
        at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.agg_doAggregateWithKeys$(Unknown Source)
        at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
        at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
        at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:231)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:225)
        at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:826)
        at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:826)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
        at org.apache.spark.scheduler.Task.run(Task.scala:99)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
我不明白问题在哪里

我试图用空值过滤掉两个数据帧

编辑:两个数据帧的架构和示例数据:

模式:

df1.printSchema-

root
 |-- uid: string (nullable = true)
df2.printSchema

root
 |-- uid: string (nullable = true)
df1的数据:

+--------------------+
|                 uid|
+--------------------+
|               sss12|
|       ushadevi_8512|
|           babu57111|
|       gianchand-199|
|          rju-815423|
df2的数据:

+--------------------+
|                 uid|
+--------------------+
|        navratn-3131|
|          jaykumar-1|
|      vishwanath-666|
|     dharmendra-5623|

请发布两个DARAFRAME的样本数据。
除外
不是一个动作。请再添加一些代码来创建一个“@philantrovert”,我已经使用了show()操作。