Apache spark 如何在spark submit的shell脚本中捕获作业状态_Apache Spark_Apache Spark Sql_Sh_Airflow

Apache spark 如何在spark submit的shell脚本中捕获作业状态

apache-spark airflow

Apache spark 如何在spark submit的shell脚本中捕获作业状态,apache-spark,apache-spark-sql,sh,airflow,Apache Spark,Apache Spark Sql,Sh,Airflow,我正在使用带有spark-sql-2.4.1v的bashshell。我在shell脚本中使用spark submit提交spark作业 Need to capture the status of my job. how can this be achieved ? 请提供任何帮助/建议？检查下面的代码 process_start_datetime=$(date +%Y%m%d%H%M%S) log_path="<log_dir>" log_file="${log_path}/${

我正在使用带有spark-sql-2.4.1v的bashshell。我在shell脚本中使用spark submit提交spark作业

Need to capture the status of my job. how can this be achieved ?

请提供任何帮助/建议？

检查下面的代码

process_start_datetime=$(date +%Y%m%d%H%M%S)
log_path="<log_dir>"
log_file="${log_path}/${app_name}_${process_start_datetime}.log"

spark-submit \
    --verbose \
    --deploy-mode cluster \
    --executor-cores "$executor_cores" \
    --num-executors "$num_executors" \
    --driver-memory "$driver_memory" \
    --executor-memory "$executor_memory"  \
    --master yarn \
    --class main.App "$appJar" 2>&1 | tee -a "$log_file"

status=$(grep "final status:" < "$log_file" | cut -d ":" -f2 | tail -1 | awk '$1=$1')

process\u start\u datetime=$（日期+%Y%m%d%H%m%S）
log_path=“”
log\u file=“${log\u path}/${app\u name}{process\u start\u datetime}.log”
火花提交\
--冗长的\
--部署模式群集\
--执行器核心“$executor_核心”\
--num executors“$num_executors”\
--驱动程序内存“$driver\u memory”\
--执行器内存“$executor\u内存”\
--母纱\
--class main.App“$appJar”2>&1 | tee-a“$log_文件”
状态=$（grep“最终状态：”<“$log_文件”| cut-d:”-f2 | tail-1 | awk'$1=$1'）

获取应用程序Id

applicationId=$(grep "tracking URL" < "$log_file" | head -n 1 | cut -d "/" -f5)

applicationId=$（grep“跟踪URL”<“$log_文件”| head-n1 | cut-d”/-f5）

spark submit

是一个异步作业，因此当我们提交命令时，您可以通过调用

SparkContext.applicationId

来获取应用程序id。然后可以检查状态

参考文献-

如果spark部署在纱线上，则可以使用-

///要获取应用程序ID，请使用应用程序-列表
纱线应用-状态应用_1459542433815_0002

他们在这篇文章中提到了另一种方法，先生，这2>&1|tee是什么？一个“$log_文件”？那么这里的2是什么？是的，您已经在执行之前设置了dir。对于上面的问题，请检查此-cut将拆分字段，-d将是分隔符，-f将是按相反顺序排列的字段修剪其他空格，例如-echo“Running”| awk'$1=$1'如何在shell脚本中获取此应用程序ID？。这将调用spark submit