Apache spark 在Spark中的数据帧中选择非空值_Apache Spark_Spark Dataframe

Apache spark 在Spark中的数据帧中选择非空值

apache-spark

Apache spark 在Spark中的数据帧中选择非空值,apache-spark,spark-dataframe,Apache Spark,Spark Dataframe,我正在Spark 2.0中读取CSV文件，并使用以下方法计算列中的非空值： val df = spark.read.option("header", "true").csv(dir) df.filter("IncidntNum is not null").count() 当我使用spark shell测试它时，它运行良好。当我创建一个包含代码的jar文件并将其提交给spark submit时，我在上面第二行得到一个异常： Exception in thread "main" org.apac

我正在Spark 2.0中读取CSV文件，并使用以下方法计算列中的非空值：

val df = spark.read.option("header", "true").csv(dir)

df.filter("IncidntNum is not null").count()

当我使用spark shell测试它时，它运行良好。当我创建一个包含代码的jar文件并将其提交给spark submit时，我在上面第二行得到一个异常：

Exception in thread "main" org.apache.spark.sql.catalyst.parser.ParseException:
extraneous input '' expecting {'(', 'SELECT', ..
== SQL ==
IncidntNum is not null
^^^

        at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:197)

知道我在使用spark shell中的代码时为什么会发生这种情况吗？

这个问题已经讨论了一段时间，但迟做总比不做强

我能想到的最可能的原因是，当使用spark submit运行时，您是在“集群”模式下运行的。这意味着驱动程序进程将位于与运行spark shell时不同的机器上。这可能会导致Spark读取不同的文件