Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark 在集群上并行读取拼花地板文件_Apache Spark_Apache Spark Sql - Fatal编程技术网

Apache spark 在集群上并行读取拼花地板文件

Apache spark 在集群上并行读取拼花地板文件,apache-spark,apache-spark-sql,Apache Spark,Apache Spark Sql,我有一个用例,我需要从1000多个目录中并行读取拼花地板文件。我正在做这样的事情: val df = list.toList.toDF() df.foreach(c => { val config = getConfigs() doSomething(spark, config) }) 在剂量测定法中,当我尝试这样做时: val df1 = spark.read.parquet(pathToRead).collect() 下面给出了一个

我有一个用例,我需要从1000多个目录中并行读取拼花地板文件。我正在做这样的事情:

    val df = list.toList.toDF()

    df.foreach(c => {
      val config = getConfigs()
      doSomething(spark, config)
    })
在剂量测定法中,当我尝试这样做时:

val df1 = spark.read.parquet(pathToRead).collect()
下面给出了一个null指针异常似乎“spark.read”只在驱动程序上有效,而不在群集上有效。我如何做我想做的事情

例外情况如下:

21/05/25 17:03:50 WARN TaskSetManager: Lost task 2.0 in stage 8.0 (TID 9, ip-10-0-5-3.us-west-2.compute.internal, executor 11): java.lang.NullPointerException

        at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:144)

        at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:142)

        at org.apache.spark.sql.DataFrameReader.<init>(DataFrameReader.scala:789)

        at org.apache.spark.sql.SparkSession.read(SparkSession.scala:656)
21/05/25 17:03:50警告TaskSetManager:在8.0阶段丢失task 2.0(TID 9,ip-10-0-5-3.us-west-2.compute.internal,executor 11):java.lang.NullPointerException
位于org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:144)
位于org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:142)
位于org.apache.spark.sql.DataFrameReader(DataFrameReader.scala:789)
位于org.apache.spark.sql.SparkSession.read(SparkSession.scala:656)