Apache spark Spark SQL(PySpark)-SparkSession导入错误

Apache spark Spark SQL(PySpark)-SparkSession导入错误,apache-spark,pyspark,apache-spark-sql,Apache Spark,Pyspark,Apache Spark Sql,我试图使用Spark Submit执行一个简单的Spark SQL代码(PySpark),但收到以下错误。注意-我在Spark 2.x中运行这个 spark提交HousePriceSolution.py 错误: from pyspark.sql import SparkSession PRICE_SQ_FT = "Price SQ Ft" if __name__ == "__main__": session = SparkSession.builder.appName("House

我试图使用Spark Submit执行一个简单的Spark SQL代码(PySpark),但收到以下错误。注意-我在Spark 2.x中运行这个

spark提交HousePriceSolution.py

错误:

 from pyspark.sql import SparkSession
 PRICE_SQ_FT = "Price SQ Ft"

 if __name__ == "__main__":

  session = SparkSession.builder.appName("HousePriceSolution").getOrCreate()    
  realEstate = session.read \
  .option("header","true") \
  .option("inferSchema", value=True) \
  .csv("hdfs:............./RealEstate.csv")

  realEstate.groupBy("Location") \
  .avg(PRICE_SQ_FT) \
  .orderBy("avg(Price SQ FT)") \
  .show()
  session.stop()
从pyspark.sql导入SparkSession ImportError:无法导入名称SparkSession

代码:

 from pyspark.sql import SparkSession
 PRICE_SQ_FT = "Price SQ Ft"

 if __name__ == "__main__":

  session = SparkSession.builder.appName("HousePriceSolution").getOrCreate()    
  realEstate = session.read \
  .option("header","true") \
  .option("inferSchema", value=True) \
  .csv("hdfs:............./RealEstate.csv")

  realEstate.groupBy("Location") \
  .avg(PRICE_SQ_FT) \
  .orderBy("avg(Price SQ FT)") \
  .show()
  session.stop()

spark submit的
可能指向了spark的另一个版本。使用以下命令检查spark submit使用的spark版本:

spark-submit --version
如果spark版本正常,则检查
PYTHONPATH
包含的内容(
echo$PYTHONPATH
),因为
PYTHONPATH
可能具有来自另一版本spark的pyspark库。如果
PYTHONPATH
不包含pyspark库,请按如下方式添加:

export PYTHONPATH=$PYTHONPATH:"$SPARK_HOME/python/lib/*"