如何使用pyspark在Amazon EMR中跟踪长期作业的进度?
我的代码由一项长期工作组成:如何使用pyspark在Amazon EMR中跟踪长期作业的进度?,pyspark,amazon-emr,Pyspark,Amazon Emr,我的代码由一项长期工作组成: from pyspark.sql import SparkSession spark = SparkSession\ .builder\ .appName("PythonPi")\ .getOrCreate() sc = spark.sparkContext from time import sleep import os def f(_): sleep(1.0) print("executor
from pyspark.sql import SparkSession
spark = SparkSession\
.builder\
.appName("PythonPi")\
.getOrCreate()
sc = spark.sparkContext
from time import sleep
import os
def f(_):
sleep(1.0)
print("executor running") # <= I can find it in the log, but only after the job ended
with open(os.path.expanduser("~/output.txt"), "w") as f: # <= can not find this file on master node
f.write("executor running")
return 1
from operator import add
output = sc.parallelize(range(1, 1000), 400).map(f).reduce(add)
print(output)
spark.stop()
从pyspark.sql导入SparkSession
火花=火花会话\
建筑商先生\
.appName(“PythonPi”)\
.getOrCreate()
sc=spark.sparkContext
从时间上导入睡眠
导入操作系统
def f(u3;):
睡眠(1.0)
打印(“executor running”)35;我在Spark History server上找到了一个度量任务(所有阶段):succeed/Total
: