Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark Oozie Spark操作工作流无法启动_Apache Spark_Hortonworks Data Platform_Oozie Workflow - Fatal编程技术网

Apache spark Oozie Spark操作工作流无法启动

Apache spark Oozie Spark操作工作流无法启动,apache-spark,hortonworks-data-platform,oozie-workflow,Apache Spark,Hortonworks Data Platform,Oozie Workflow,我有一个简单的火花工作,不能通过Oozie运行。通过spark submit运行相同的spark作业。我提交作业工作流并出现以下错误: 2020-10-06 11:30:05,677 INFO [main] org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.mapreduce.v2.app.MRAppMaster failed in state INITED org.apache.hadoop.yarn.ex

我有一个简单的火花工作,不能通过Oozie运行。通过spark submit运行相同的spark作业。我提交作业工作流并出现以下错误:

2020-10-06 11:30:05,677 INFO [main] org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.mapreduce.v2.app.MRAppMaster failed in state INITED
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error while initializing
        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:368)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$6.run(MRAppMaster.java:1760)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1757)
        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1691)
Caused by: java.io.IOException: Couldn't create proxy provider class org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
        at org.apache.hadoop.hdfs.NameNodeProxies.createFailoverProxyProvider(NameNodeProxies.java:436)
        at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:143)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:510)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:453)
        at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:136)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:227)
        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.getFileSystem(MRAppMaster.java:605)
        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:315)
        ... 7 more
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.hdfs.NameNodeProxies.createFailoverProxyProvider(NameNodeProxies.java:425)
        ... 19 more
Caused by: java.lang.IllegalAccessError: tried to access class org.apache.hadoop.security.token.Token$PrivateToken from class org.apache.hadoop.hdfs.HAUtil
        at org.apache.hadoop.hdfs.HAUtil.cloneDelegationTokenForLogicalUri(HAUtil.java:271)
        at org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider.<init>(ConfiguredFailoverProxyProvider.java:105)
        ... 24 more
2020-10-06 11:30:05,689 ERROR [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error while initializing
        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:368)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$6.run(MRAppMaster.java:1760)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1757)
        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1691)
Caused by: java.io.IOException: Couldn't create proxy provider class org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
        at org.apache.hadoop.hdfs.NameNodeProxies.createFailoverProxyProvider(NameNodeProxies.java:436)
        at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:143)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:510)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:453)
        at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:136)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:227)
        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.getFileSystem(MRAppMaster.java:605)
        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:315)
        ... 7 more
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.hdfs.NameNodeProxies.createFailoverProxyProvider(NameNodeProxies.java:425)
        ... 19 more
Caused by: java.lang.IllegalAccessError: tried to access class org.apache.hadoop.security.token.Token$PrivateToken from class org.apache.hadoop.hdfs.HAUtil
        at org.apache.hadoop.hdfs.HAUtil.cloneDelegationTokenForLogicalUri(HAUtil.java:271)
        at org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider.<init>(ConfiguredFailoverProxyProvider.java:105)
        ... 24 more
2020-10-06 11:30:05,694 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting with status 1: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error while initializing

这是工作流XML:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<workflow-app xmlns="uri:oozie:workflow:0.5" name="MSTI-Daily-Data">
    <global>
        <job-tracker>http://***:8050</job-tracker>
        <name-node>hdfs://***:8020</name-node>
    </global>
    <credentials>
        <credential name="hive_auth" type="hcat">
            <property>
                <name>hcat.metastore.principal</name>
                <value>hive/_HOST@***/value>
            </property>
            <property>
                <name>hcat.metastore.uri</name>
                <value>thrift://***:9083</value>
            </property>
        </credential>
        <credential name="hive_jdbc" type="hive2">
            <property>
                <name>hive2.jdbc.url</name>
                <value>jdbc:hive2://***:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2</value>
            </property>
            <property>
                <name>hive2.server.principal</name>
                <value>hive/_HOST@***</value>
            </property>
        </credential>
    </credentials>
    <start to="import-sqooped-data"/>
    <action name="import-sqooped-data">
        <spark xmlns="uri:oozie:spark-action:0.2">
            <job-tracker>${resourceManager}</job-tracker>
            <name-node>${nameNode}</name-node>
            <configuration>
                <property>
                    <name>mapred.job.queue.name</name>
                    <value>DataWrangling</value>
                </property>
            </configuration>
            <master>yarn-cluster</master>
            <name>Daily-Ingest</name>
            <class>ClassName</class>
            <jar>/path-to-scala-app.jar</jar>
            <spark-opts>--driver-memory 16g --master yarn --queue QueueName --executor-memory 12G --num-executors 12 --executor-cores 2</spark-opts>
            <file>/path-to-scala-app.jar</file>
            <file>/path-to/hive-warehouse-connector-assembly-1.0.0.3.1.0.0-78.jar</file>
        </spark>
        <ok to="success-e-mail"/>
        <error to="failure-e-mail"/>
    </action>
    <action name="success-e-mail">
        <email xmlns="uri:oozie:email-action:0.2">
            <to>***</to>
            <subject>Daily Data Job Succeeded</subject>
            <body>The Oozie job completed successfully. See logs for details.</body>
        </email>
        <ok to="end"/>
        <error to="kill"/>
    </action>
    <action name="failure-e-mail">
        <email xmlns="uri:oozie:email-action:0.2">
            <to>***</to>
            <subject>Daily Data Job Failed</subject>
            <body>The Oozie job failed. See logs for details.</body>
        </email>
        <ok to="kill"/>
        <error to="kill"/>
    </action>
    <kill name="kill">
        <message>${wf:errorMessage(wf:lastErrorNode())}</message>
    </kill>
    <end name="end"/>
</workflow-app>


http://***:8050
hdfs://***:8020
hcat.metastore.principal
配置单元/_主机@***/value>
hcat.metastore.uri
节俭://***:9083
hive2.jdbc.url
jdbc:hive2://***:2181/;serviceDiscoveryMode=动物园管理员;zooKeeperNamespace=hiveserver2
hive2.server.principal
蜂巢/主机@***
${resourceManager}
${nameNode}
mapred.job.queue.name
数据争论
纱线团
每日摄入
类名
/指向scala-app.jar的路径
--驱动程序内存16g——主线程——队列队列名称——执行器内存12G——num executors 12——执行器核心2
/指向scala-app.jar的路径
//hive-warehouse-connector-assembly-1.0.0.3.1.0.0-78.jar的路径
***
每日数据作业成功
Oozie作业已成功完成。有关详细信息,请参阅日志。
***
每日数据作业失败
Oozie的工作失败了。有关详细信息,请参阅日志。
${wf:errorMessage(wf:lastErrorNode())}
同一集群可以毫无问题地运行其他操作和工作流。一旦spark操作成为工作流的一部分,该错误就会导致启动器应用程序几乎立即失败。
任何帮助都将不胜感激。

根据另一篇帖子的建议,我已将hadoop-client-3.1.1.3.1.0.0-78.jar添加到spark共享库文件夹中

这不起作用,产生了这个错误。删除该文件后,我遇到了另一个错误

所有其他类型的oozie操作都有效,只是蜂巢仓库连接器的火花导致了问题。这是新的错误。当包含hive-warehouse-connector-assembly-1.0.0.3.1.0.0-78.jar时会发生这种情况:

  • 在oozie sharelib文件夹中
  • 在spark动作中使用标记
如果我不包括hive-warehouse-connector-assembly-1.0.0.3.1.0.0-78.jar,我显然会得到类notfound异常,但scala应用程序确实会得到执行

ERROR [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.NoSuchMethodError: org.apache.hadoop.io.retry.RetryUtil
s.getDefaultRetryPolicy(Lorg/apache/hadoop/conf/Configuration;Ljava/lang/String;ZLjava/lang/String;Ljava/lang/String;Ljava/lang/Class;)Lorg/apache/hadoop/io/ret
ry/RetryPolicy;
        at org.apache.hadoop.hdfs.NameNodeProxies.createNNProxyWithClientProtocol(NameNodeProxies.java:318)
        at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:235)
        at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:139)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:510)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:453)
        at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:136)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:227)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:173)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)

ERROR[main]org.apache.hadoop.mapred.YarnChild:ERROR运行child:java.lang.NoSuchMethodError:org.apache.hadoop.io.retry.RetryUtil
s、 getDefaultRetryPolicy(Lorg/apache/hadoop/conf/Configuration;Ljava/lang/String;ZLjava/lang/String;Ljava/lang/String;Ljava/lang/Class;)Lorg/apache/hadoop/io/ret
回收/再利用政策;
位于org.apache.hadoop.hdfs.NameNodeProxies.createNNProxyWithClientProtocol(NameNodeProxies.java:318)
位于org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:235)
位于org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:139)
位于org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:510)
位于org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:453)
在org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:136)上
位于org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303)
位于org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
位于org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
位于org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320)
位于org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479)
位于org.apache.hadoop.fs.FileSystem.get(FileSystem.java:227)
位于org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:173)
位于java.security.AccessController.doPrivileged(本机方法)
位于javax.security.auth.Subject.doAs(Subject.java:422)
位于org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
位于org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)

根据另一篇帖子的建议,我将hadoop-client-3.1.1.3.1.0.0-78.jar添加到spark共享库文件夹中

这不起作用,产生了这个错误。删除该文件后,我遇到了另一个错误

所有其他类型的oozie操作都有效,只是蜂巢仓库连接器的火花导致了问题。这是新的错误。当包含hive-warehouse-connector-assembly-1.0.0.3.1.0.0-78.jar时会发生这种情况:

  • 在oozie sharelib文件夹中
  • 在spark动作中使用标记
如果我不包括hive-warehouse-connector-assembly-1.0.0.3.1.0.0-78.jar,我显然会得到类notfound异常,但scala应用程序确实会得到执行

ERROR [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.NoSuchMethodError: org.apache.hadoop.io.retry.RetryUtil
s.getDefaultRetryPolicy(Lorg/apache/hadoop/conf/Configuration;Ljava/lang/String;ZLjava/lang/String;Ljava/lang/String;Ljava/lang/Class;)Lorg/apache/hadoop/io/ret
ry/RetryPolicy;
        at org.apache.hadoop.hdfs.NameNodeProxies.createNNProxyWithClientProtocol(NameNodeProxies.java:318)
        at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:235)
        at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:139)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:510)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:453)
        at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:136)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:227)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:173)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)

ERROR[main]org.apache.hadoop.mapred.YarnChild:ERROR运行child:java.lang.NoSuchMethodError:org.apache.hadoop.io.retry.RetryUtil
s、 getDefaultRetryPolicy(Lorg/apache/hadoop/conf/Configuration;Ljava/lang/String;ZLjava/lang/String;Ljava/lang/String;Ljava/lang/Class;)Lorg/apache/hadoop/io/ret
回收/再利用政策;
位于org.apache.hadoop.hdfs.NameNodeProxies.createNNProxyWithClientProtocol(NameNodeProxies.java:318)
位于org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:235)
位于org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:139)
位于org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:510)
位于org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:453)
在org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:136)上
位于org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303)