Apache spark Spark.read.csv错误:java.io.IOException:权限被拒绝

Apache spark Spark.read.csv错误:java.io.IOException:权限被拒绝,apache-spark,apache-spark-sql,apache-spark-2.0,Apache Spark,Apache Spark Sql,Apache Spark 2.0,我正在使用Spark v2.0并尝试使用以下方式读取csv文件: spark.read.csv("filepath") 但得到以下错误: java.lang.RuntimeException: java.lang.RuntimeException: java.io.IOException: Permission denied at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522) at

我正在使用Spark v2.0并尝试使用以下方式读取csv文件:

spark.read.csv("filepath")
但得到以下错误:

java.lang.RuntimeException: java.lang.RuntimeException: java.io.IOException: Permission denied
  at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
  at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:171)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
  at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
  at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:258)
  at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:359)
  at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:263)
  at org.apache.spark.sql.hive.HiveSharedState.metadataHive$lzycompute(HiveSharedState.scala:39)
  at org.apache.spark.sql.hive.HiveSharedState.metadataHive(HiveSharedState.scala:38)
  at org.apache.spark.sql.hive.HiveSharedState.externalCatalog$lzycompute(HiveSharedState.scala:46)
  at org.apache.spark.sql.hive.HiveSharedState.externalCatalog(HiveSharedState.scala:45)
  at org.apache.spark.sql.hive.HiveSessionState.catalog$lzycompute(HiveSessionState.scala:50)
  at org.apache.spark.sql.hive.HiveSessionState.catalog(HiveSessionState.scala:48)
  at org.apache.spark.sql.hive.HiveSessionState$$anon$1.<init>(HiveSessionState.scala:63)
  at org.apache.spark.sql.hive.HiveSessionState.analyzer$lzycompute(HiveSessionState.scala:63)
  at org.apache.spark.sql.hive.HiveSessionState.analyzer(HiveSessionState.scala:62)
  at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:49)
  at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
  at org.apache.spark.sql.SparkSession.baseRelationToDataFrame(SparkSession.scala:382)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:143)
  at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:401)
  at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:342)
  ... 48 elided
Caused by: java.lang.RuntimeException: java.io.IOException: Permission denied
  at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:515)
  ... 71 more
Caused by: java.io.IOException: Permission denied
  at java.io.UnixFileSystem.createFileExclusively(Native Method)
  at java.io.File.createTempFile(File.java:2024)
  at org.apache.hadoop.hive.ql.session.SessionState.createTempFile(SessionState.java:818)
  at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:513)
  ... 71 more
java.lang.RuntimeException:java.lang.RuntimeException:java.io.IOException:权限被拒绝
位于org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
位于org.apache.spark.sql.hive.client.HiveClientImpl.(HiveClientImpl.scala:171)
位于sun.reflect.NativeConstructorAccessorImpl.newInstance0(本机方法)
位于sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
在sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
位于java.lang.reflect.Constructor.newInstance(Constructor.java:423)
位于org.apache.spark.sql.hive.client.IsolatedClient.createClient(IsolatedClient.scala:258)
位于org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:359)
位于org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:263)
位于org.apache.spark.sql.hive.HiveSharedState.metadataHive$lzycompute(HiveSharedState.scala:39)
位于org.apache.spark.sql.hive.HiveSharedState.metadataHive(HiveSharedState.scala:38)
位于org.apache.spark.sql.hive.HiveSharedState.externalCatalog$lzycompute(HiveSharedState.scala:46)
位于org.apache.spark.sql.hive.HiveSharedState.externalCatalog(HiveSharedState.scala:45)
位于org.apache.spark.sql.hive.HiveSessionState.catalog$lzycompute(HiveSessionState.scala:50)
位于org.apache.spark.sql.hive.HiveSessionState.catalog(HiveSessionState.scala:48)
位于org.apache.spark.sql.hive.HiveSessionState$$anon$1。(HiveSessionState.scala:63)
位于org.apache.spark.sql.hive.HiveSessionState.analyzer$lzycompute(HiveSessionState.scala:63)
位于org.apache.spark.sql.hive.HiveSessionState.analyzer(HiveSessionState.scala:62)
位于org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:49)
位于org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
位于org.apache.spark.sql.SparkSession.baseRelationToDataFrame(SparkSession.scala:382)
位于org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:143)
位于org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:401)
位于org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:342)
... 48删去
原因:java.lang.RuntimeException:java.io.IOException:权限被拒绝
位于org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:515)
... 71多
原因:java.io.IOException:权限被拒绝
位于java.io.UnixFileSystem.createFileExclusive(本机方法)
位于java.io.File.createTempFile(File.java:2024)
位于org.apache.hadoop.hive.ql.session.SessionState.createTempFile(SessionState.java:818)
位于org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:513)
... 71多

我也尝试过使用
.format(“csv”).csv(“filepath”)
,但也得到了相同的结果。

如果查看异常堆栈跟踪的最后一部分,您会意识到此错误不是因为没有足够的权限访问“filepath”处的文件

我在Windows客户端上使用Spark shell时遇到类似问题。这就是我犯的错误

  at java.io.WinNTFileSystem.createFileExclusively(Native Method)
  at java.io.File.createTempFile(File.java:2024)
  at org.apache.hadoop.hive.ql.session.SessionState.createTempFile(SessionState.java:818)
  at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:513)
请注意它在堆栈跟踪中是如何显示WinNTFileSystem的(而您将其作为UnixFileSystem),这使我能够更仔细地查看此堆栈跟踪。我意识到当前用户无权在本地创建临时文件。更具体地说,
org.apache.hadoop.hive.ql.session.SessionState
尝试在hive本地scratch目录中创建临时文件。如果当前用户没有足够的权限执行此操作,则会出现此错误

对我来说,在Windows上,我意识到我必须“以管理员身份运行”用于运行Spark Shell的命令提示符。这对我很有效


对于您来说,在Unix上,我想要么使用
sudo
或者更新配置单元配置来设置本地临时目录,要么更新现有配置单元配置的目录安全设置就可以了。

试试这段代码可能会有所帮助

从Csv读取数据

Dataset<Row> src = sqlContext.read()
        .format("com.databricks.spark.csv")
        .option("header", "true")
        .load("Source_new.csv");`

确保您的“文件路径”具有正确的权限Hi Bhavesh,文件路径具有以下权限:-rwxr-xr-x 3 pratyush04 hdf知道进程尝试创建临时文件的路径吗?并非总是能够以sudo/administrator的身份运行?“进程尝试创建临时文件的路径?”检查$Hive_HOME/conf/Hive-site.xml中配置单元Hive.exec.scratchdir属性中的本地临时目录
src.write()
        .format("com.databricks.spark.csv")
        .option("header", "true")
        .save("LowerCaseData.csv");