Apache spark Spark SQL(PySpark)-SparkSession导入错误
我试图使用Spark Submit执行一个简单的Spark SQL代码(PySpark),但收到以下错误。注意-我在Spark 2.x中运行这个 spark提交HousePriceSolution.py 错误:Apache spark Spark SQL(PySpark)-SparkSession导入错误,apache-spark,pyspark,apache-spark-sql,Apache Spark,Pyspark,Apache Spark Sql,我试图使用Spark Submit执行一个简单的Spark SQL代码(PySpark),但收到以下错误。注意-我在Spark 2.x中运行这个 spark提交HousePriceSolution.py 错误: from pyspark.sql import SparkSession PRICE_SQ_FT = "Price SQ Ft" if __name__ == "__main__": session = SparkSession.builder.appName("House
from pyspark.sql import SparkSession
PRICE_SQ_FT = "Price SQ Ft"
if __name__ == "__main__":
session = SparkSession.builder.appName("HousePriceSolution").getOrCreate()
realEstate = session.read \
.option("header","true") \
.option("inferSchema", value=True) \
.csv("hdfs:............./RealEstate.csv")
realEstate.groupBy("Location") \
.avg(PRICE_SQ_FT) \
.orderBy("avg(Price SQ FT)") \
.show()
session.stop()
从pyspark.sql导入SparkSession
ImportError:无法导入名称SparkSession
代码:
from pyspark.sql import SparkSession
PRICE_SQ_FT = "Price SQ Ft"
if __name__ == "__main__":
session = SparkSession.builder.appName("HousePriceSolution").getOrCreate()
realEstate = session.read \
.option("header","true") \
.option("inferSchema", value=True) \
.csv("hdfs:............./RealEstate.csv")
realEstate.groupBy("Location") \
.avg(PRICE_SQ_FT) \
.orderBy("avg(Price SQ FT)") \
.show()
session.stop()
spark submit的
可能指向了spark的另一个版本。使用以下命令检查spark submit使用的spark版本:
spark-submit --version
如果spark版本正常,则检查PYTHONPATH
包含的内容(echo$PYTHONPATH
),因为PYTHONPATH
可能具有来自另一版本spark的pyspark库。如果PYTHONPATH
不包含pyspark库,请按如下方式添加:
export PYTHONPATH=$PYTHONPATH:"$SPARK_HOME/python/lib/*"