Hadoop Spark SQL over cluster的hdfs权限不正确
我有一个简单的工作,就是通过spark sql在hive上读取hdfs。我开始运行它在纱线客户端模式,我没有任何问题。因为有几次,我开始通过纱线簇模式启动它,我遇到了以下问题: 我有此hdfs权限错误:Hadoop Spark SQL over cluster的hdfs权限不正确,hadoop,apache-spark,hive,hdfs,apache-spark-sql,Hadoop,Apache Spark,Hive,Hdfs,Apache Spark Sql,我有一个简单的工作,就是通过spark sql在hive上读取hdfs。我开始运行它在纱线客户端模式,我没有任何问题。因为有几次,我开始通过纱线簇模式启动它,我遇到了以下问题: 我有此hdfs权限错误: Caused by:MetaException(message:org.apache.hadoop.security.AccessControlException: Permission denied: user=yarn, access=EXECUTE, inode="/Projects/SN
Caused by:MetaException(message:org.apache.hadoop.security.AccessControlException: Permission denied: user=yarn, access=EXECUTE, inode="/Projects/SNB/directory/Private/table/table_ORC":hdfs:mygroup:drwxr-xr--
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:259)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:205)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1698)
at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getFileInfo(FSDirStatAndListingOp.java:108)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3857)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1006)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:843)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_result$get_table_resultStandardScheme.read(ThriftHiveMetastore.java:29329)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_result$get_table_resultStandardScheme.read(ThriftHiveMetastore.java:29306)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_result.read(ThriftHiveMetastore.java:29237)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_table(ThriftHiveMetastore.java:1036)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_table(ThriftHiveMetastore.java:1022)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:997)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:89)
at com.sun.proxy.$Proxy31.getTable(Unknown Source)
at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:976)
... 68 more`
但是,当我在此目录上执行hdfs dfs-ls时,它会显示以下内容:
drwxrwxrwx -lb23598 mygroups 0 2016-12-20 17:58 /Projects/SNB/directory/Private/table/table_ORC
因此,在Thread获得的内容和hdfs中设置的当前权限之间存在着类似于去同步的情况
你有什么想法吗
非常感谢 看起来您的spark驱动程序正在作为
Thread
用户提交,您试图访问的hdfs路径属于其他用户
将hdfs文件的所有者更改为
warn
,或者以与hdfs路径所有者相同的用户身份提交spark作业,都可以解决您的问题。在提交作业之前,请尝试创建如下所示的环境变量
export HADOOP_USER_NAME=<NAME_OF_THE_USER_THAT_HAS_HDFS_PERMISSION>
导出HADOOP\u用户名=
权限被拒绝:用户=纱线
>>您是否以系统用户身份运行纱线作业?!?我以为它被列入黑名单,甚至不允许启动作业……不,我是以普通用户的身份运行作业是的,我理解错误,但我是以普通unix用户的身份运行作业,而不是以Yarn的身份运行作业。你能告诉我HADOOP_user_NAME环境变量的值吗?提交作业时,请尝试将其设置为与hdfs文件所有者相同的用户。我怎么知道?但事实上,这只是问题的一部分。我不明白为什么hdfs权限在日志中不一样,我不知道你能不能尝试将hdfs文件所有者更改为Thread,看看它是否有效?