OOzie中PySpark作业的主类

OOzie中PySpark作业的主类,pyspark,oozie,hortonworks-data-platform,oozie-coordinator,Pyspark,Oozie,Hortonworks Data Platform,Oozie Coordinator,我创建了一个pySpark作业,它通过spark submit提交时工作得非常好。现在,当我尝试通过Oozie时,它失败了。我怀疑我输入的字段有问题。Oozie中的Spark操作需要这些字段 Spark Master : local Mode : client Main class : DO I need to enter anything here as its Python + Spark code (Pyspark) Jars/py files : My py module 日志标准

我创建了一个pySpark作业,它通过spark submit提交时工作得非常好。现在,当我尝试通过Oozie时,它失败了。我怀疑我输入的字段有问题。Oozie中的Spark操作需要这些字段

Spark Master : local
Mode : client 
Main class : DO I need to enter anything here as its Python + Spark code (Pyspark)
Jars/py files : My py module

日志标准如下所示

  =================================================================

  >>> Invoking Main class now >>>

  Fetching child yarn jobs
  tag id : oozie-653992fdf1609a2d4e19a863dff21a1
  Child yarn jobs are found -
  Spark Action Main class        : org.apache.spark.deploy.SparkSubmit

  Oozie Spark action configuration
  =================================================================

  --master
  local[*]
  --deploy-mode
  client
  --name
  POC1L
  --verbose
  /user/sachinkerala6174/pgm/poc1l.py

  =================================================================

  >>> Invoking Spark class now >>>

  python: can't open file '/user/sachinkerala6174/pgm/poc1l.py': [Errno 2] No such file or directory
  Intercepting System.exit(2)

  <<< Invocation of Main class completed <<<

  Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], exit code [2]

  Oozie Launcher failed, finishing Hadoop job gracefully

  Oozie Launcher, uploading action data to HDFS sequence file: hdfs://ip-172-31-53-48.ec2.internal:8020/user/sachinkerala6174/oozie-oozi/0000509-170711051319609-oozie-oozi-W/spark-fea0--spark/action-data.seq

  Oozie Launcher ends
=================================================================
>>>正在调用主类>>>
寻找儿童工作
标签id:oozie-653992fdf1609a2d4e19a863dff21a1
找到了儿童工作-
Spark操作主类:org.apache.Spark.deploy.SparkSubmit
Oozie火花动作配置
=================================================================
--主人
本地[*]
--部署模式
客户
--名字
POC1L
--冗长的
/用户/sachinkera6174/pgm/poc1l.py
=================================================================
>>>现在调用Spark类>>>
python:无法打开文件“/user/sachinkera6174/pgm/poc1l.py”:[Errno 2]没有这样的文件或目录
拦截系统出口(2)

您不需要在“主类”输入中输入任何内容。只需将
hdfs://
前缀添加到python文件路径,并将Master更改为
warn
,将Mode更改为
cluster
(如果源代码在hdfs上,则需要使用AFAIR)