Python Spark:在独立模式下读取拼花地板文件会产生错误

Python Spark:在独立模式下读取拼花地板文件会产生错误,python,apache-spark,pyspark,parquet,Python,Apache Spark,Pyspark,Parquet,使用SparkContext=local运行 from pyspark import SparkContext, SparkConf, SQLContext sc = SparkContext( 'local', 'pyspark') sqlContext = SQLContext(sc) path = "/root/users.parquet" sqlContext.read.parquet(path).printSchema() 输出: root |-- name: string (nu

使用SparkContext=local运行

from pyspark import SparkContext, SparkConf, SQLContext
sc = SparkContext( 'local', 'pyspark')
sqlContext = SQLContext(sc)
path = "/root/users.parquet"
sqlContext.read.parquet(path).printSchema()
输出:

root
 |-- name: string (nullable = false)
 |-- favorite_color: string (nullable = true)
 |-- favorite_numbers: array (nullable = false)
 |    |-- element: integer (containsNull = false)
16/02/01 09:16:30 WARN TaskSetManager: Lost task 111.0 in stage 0.0 (TID 111, 10.16.34.110): java.io.IOException: Could not read footer: java.io.IOException: Could not read footer for file FileStatus{path=file:/root/users.parquet; isDirectory=false; length=615; replication=0; blocksize=0; modification_time=0; access_time=0; owner=; group=; permission=rw-rw-rw-; isSymlink=false}
at org.apache.parquet.hadoop.ParquetFileReader.readAllFootersInParallel(ParquetFileReader.java:247)
与SparkContext一起运行=主站[主站与4个从站]

from pyspark import SparkContext, SparkConf, SQLContext
appName = "SparkClusterEvalPOC"
master = "spark://<masterHostName>:7077"
conf = SparkConf().setAppName(appName).setMaster(master)
sc = SparkContext(conf=conf)
sqlContext = SQLContext(sc)
path = "/root/users.parquet"
sqlContext.read.parquet(path).printSchema()

欢迎提供任何帮助。

路径是本地路径,不是吗?是的-在同一主机上运行是否意味着您在一台计算机上运行独立群集?如果不是,则所有工人都可以使用
路径
?明白了。假设主服务器将与从服务器共享文件系统。