Apache spark 将spark.sql数据帧结果写入拼花地板文件
我启用了以下spark.sql会话:Apache spark 将spark.sql数据帧结果写入拼花地板文件,apache-spark,hive,pyspark,hdfs,Apache Spark,Hive,Pyspark,Hdfs,我启用了以下spark.sql会话: # creating Spark context and connection spark = (SparkSession.builder.appName("appName").enableHiveSupport().getOrCreate()) 并且我能够看到以下查询的结果: spark.sql("select year(plt_date) as Year, month(plt_date) as Mounth, count(build) as B_Co
# creating Spark context and connection
spark = (SparkSession.builder.appName("appName").enableHiveSupport().getOrCreate())
并且我能够看到以下查询的结果:
spark.sql("select year(plt_date) as Year, month(plt_date) as Mounth, count(build) as B_Count, count(product) as P_Count from first_table full outer join second_table on key1=CONCAT('SS',key_2) group by year(plt_date), month(plt_date)").show()
但是,当我尝试将此查询产生的数据帧写入hdfs时,会出现以下错误:
我能够将此查询的简单版本的结果数据帧保存到同一路径。添加count()、year()等函数会出现问题
有什么问题?我如何将结果保存到hdfs?由于“(“年(投票日期为日期)”列中的“出现”)而出现错误:
用于重命名:
data = data.selectExpr("year(CAST(plt_date AS DATE)) as nameofcolumn")
如果有效,请投票
参考: