Python 如何在pyspark/databricks中执行作业列表?
环境:Python 如何在pyspark/databricks中执行作业列表?,python,apache-spark,pyspark,azure-databricks,Python,Apache Spark,Pyspark,Azure Databricks,环境: spark.dynamicAllocation.enabled true spark.scheduler.mode FAIR spark.databricks.delta.preview.enabled true spark.shuffle.service.enabled true spark.databricks.service.server.enabled true Databricks 6.4(包括Apache Spark 2.4.5和Scala 2.11) 具有自动缩放和终止
spark.dynamicAllocation.enabled true
spark.scheduler.mode FAIR
spark.databricks.delta.preview.enabled true
spark.shuffle.service.enabled true
spark.databricks.service.server.enabled true
- Databricks 6.4(包括Apache Spark 2.4.5和Scala 2.11)
- 具有自动缩放和终止功能的高并发服务器
spark.dynamicAllocation.enabled true
spark.scheduler.mode FAIR
spark.databricks.delta.preview.enabled true
spark.shuffle.service.enabled true
spark.databricks.service.server.enabled true
我尝试过的
- 我没有得到任何错误,也没有得到任何计数
import datetime
from pyspark.sql.types import StructType, IntegerType, DateType, StringType, StructField
current_time = datetime.datetime.now()
current_time = str(current_time)
print(current_time)
localPath = '/mnt'
def job1(inputText):
path = localPath + '/Table{0}/'.format(inputText)
print(path)
mySchema = StructType([StructField("textValue1", StringType())])
rddDF = sc.parallelize((
{"textValue": current_time },\
{ "textValue": current_time },\
{ "textValue": current_time}))
new_df1 = sqlContext.createDataFrame(rddDF,mySchema1)
new_df1 = new_df1.fillna(current_time +'1' )
new_df1.repartition(1).write.format("parquet").mode("overwrite").save(path)
job1('1')
def job2(inputText):
path = localPath + '/Table{0}/'.format(inputText)
print(path)
mySchema = StructType([StructField("textValue1", StringType())])
rddDF = sc.parallelize((
{"textValue": current_time },\
{ "textValue": current_time },\
{ "textValue": current_time}))
new_df1 = sqlContext.createDataFrame(rddDF,mySchema1)
new_df1 = new_df1.fillna(current_time +'1' )
new_df1.repartition(1).write.format("parquet").mode("overwrite").save(path)
job2('2')
def job3(inputText):
path = localPath + '/Table{0}/'.format(inputText)
print(path)
mySchema = StructType([StructField("textValue1", StringType())])
rddDF = sc.parallelize((
{"textValue": current_time },\
{ "textValue": current_time },\
{ "textValue": current_time}))
new_df1 = sqlContext.createDataFrame(rddDF,mySchema1)
new_df1 = new_df1.fillna(current_time +'1' )
new_df1.repartition(1).write.format("parquet").mode("overwrite").save(path)
job3('3')
listOfJobs = ['job1', 'job2', 'job3']
print(listOfJobs)