Amazon web services 使用Glue Catalog的EMR PySpark |无法从空字符串创建路径;
我尝试使用以下spark sql显示数据:Amazon web services 使用Glue Catalog的EMR PySpark |无法从空字符串创建路径;,amazon-web-services,pyspark,amazon-emr,Amazon Web Services,Pyspark,Amazon Emr,我尝试使用以下spark sql显示数据: spark.sql('select id,name,last_modified_dt,created_ts from (select a.*, row_number() over (partition by id order by created_ts desc ,last_modified_dt desc) as rnk from ( select * from pratik_test_staging.temp1 s union all select
spark.sql('select id,name,last_modified_dt,created_ts from (select a.*, row_number() over (partition by id order by created_ts desc ,last_modified_dt desc) as rnk from ( select * from pratik_test_staging.temp1 s union all select id,name,last_modified_dt,created_ts from pratik_test_temp.temp1)a)b where rnk = 1').show()
它给我的输出是:
但当我试图将其写回pratik_test_staging.temp1时,它给了我一个错误:
spark.sql('select id,name,last_modified_dt,created_ts from (select a.*, row_number() over (partition by id order by created_ts desc ,last_modified_dt desc) as rnk from ( select * from pratik_test_staging.temp1 s union all select id,name,last_modified_dt,created_ts from pratik_test_temp.temp1)a)b where rnk = 1').write.mode('overwrite').insertInto('pratik_test_staging.temp1',overwrite=True)
错误:
注:
1.我用ORC和Parquet都试过了,两种文件格式都出现了同样的错误。
2.我正在使用胶水目录运行Sparkerr。数据存储在S3上,您需要指定路径。你可以试试这个
resultDf.write
.option("path",outputPath)
.mode(SaveMode.Overwrite)
.saveAsTable(outputTableName)
谢谢你的更新,我会试试的。此外,spark.sqlINSERT OVERWRITE TABLE SELECT*FROM TABLE也会引发相同的错误。