如何使用pyspark在Amazon EMR中跟踪长期作业的进度?

如何使用pyspark在Amazon EMR中跟踪长期作业的进度?,pyspark,amazon-emr,Pyspark,Amazon Emr,我的代码由一项长期工作组成: from pyspark.sql import SparkSession spark = SparkSession\ .builder\ .appName("PythonPi")\ .getOrCreate() sc = spark.sparkContext from time import sleep import os def f(_): sleep(1.0) print("executor

我的代码由一项长期工作组成:

from pyspark.sql import SparkSession
spark = SparkSession\
    .builder\
    .appName("PythonPi")\
    .getOrCreate()
sc = spark.sparkContext

from time import sleep
import os
def f(_):
    sleep(1.0)
    print("executor running") # <= I can find it in the log, but only after the job ended
    with open(os.path.expanduser("~/output.txt"), "w") as f: # <= can not find this file on master node
      f.write("executor running") 
    return 1

from operator import add
output = sc.parallelize(range(1, 1000), 400).map(f).reduce(add)
print(output)
spark.stop()
从pyspark.sql导入SparkSession
火花=火花会话\
建筑商先生\
.appName(“PythonPi”)\
.getOrCreate()
sc=spark.sparkContext
从时间上导入睡眠
导入操作系统
def f(u3;):
睡眠(1.0)

打印(“executor running”)35;我在Spark History server上找到了一个度量
任务(所有阶段):succeed/Total