Amazon web services 如何更改从AWS Step函数映射并行运行的粘合作业的最大并发运行数?
我有一个带有映射的Step函数,可以运行5个带有自定义参数的并行粘合作业,如下所示:Amazon web services 如何更改从AWS Step函数映射并行运行的粘合作业的最大并发运行数?,amazon-web-services,concurrency,terraform,aws-glue,aws-step-functions,Amazon Web Services,Concurrency,Terraform,Aws Glue,Aws Step Functions,我有一个带有映射的Step函数,可以运行5个带有自定义参数的并行粘合作业,如下所示: "Run Glue Jobs": { "Type": "Map", "MaxConcurrency": 5, "ItemsPath": "$.payload", "Iterator": { "StartAt": "Run
"Run Glue Jobs": {
"Type": "Map",
"MaxConcurrency": 5,
"ItemsPath": "$.payload",
"Iterator": {
"StartAt": "Run Generic Glue Job",
"States": {
"Run Generic Glue Job": {
"Type": "Task",
"Resource": "arn:aws:states:::glue:startJobRun.sync",
"Parameters": {
"JobName": "GlueJobName",
"Arguments": {
"--target_bucket": "target-bucket",
"--target_path": "dir1/dir2/",
"--job-language": "python",
"--job-bookmark-option": "job-bookmark-disable",
"--TempDir": "s3://temp-bucket/tmp-glue",
"--continuous-log-logGroup": "gluecloudwatch",
"--enable-continuous-cloudwatch-log": "true",
"--enable-continuous-log-filter": "true",
"--enable-metrics": "",
"--kmskeyid": "arn:aws:kms:region:12345678901:alias/glue-kms-key"
}
},
"End": true
}
}
},
"Next": "Finish"
当我运行这个阶段的step函数时,它会捕获错误,即:Glue.concurrentrunsexceedexception
有没有一种方法可以将参数传递到MaxConcurrentRuns:5(ExecutionProperty)这样我就可以同时运行多达5个作业?我找不到任何地方能做到这一点。仅此无关资源:
我只能在GUI中手动编辑作业,但我需要在Terraform中从头开始创建所有内容,因此我还需要从一个书面源代码中启用5个MaxConcurrentRuns。有什么建议吗?谢谢。我们无法通过步骤函数设置Glue Max并发运行。如果使用
MaxConcurrency
5运行步骤函数映射,我们还需要创建/更新胶水作业最大并发运行数,使其最小为5
从AWS CLI创建粘合作业时,可以将MaxConcurrentRuns
作为ExecutionProperty.MaxConcurrentRuns传递
下面是一个json示例
{
"Name": "my-glue-job",
"Role": "arn:aws:iam::111122223333:role/glue_etl_service_role",
"ExecutionProperty": {
"MaxConcurrentRuns": 5
},
"Command": {
"Name": "glueetl",
"ScriptLocation": "s3://temp-sandbox/code/scripts/MyGlueScript.scala",
"PythonVersion": "3"
},
"DefaultArguments": {
"--TempDir": "s3://aws-glue-temporary-111122223333-us-east-1/admin",
"--class": "com.mycompany.corp.MainClass",
"--enable-continuous-cloudwatch-log": "true",
"--enable-metrics": "",
"--enable-spark-ui": "true",
"--extra-jars": "s3://temp-sandbox/code/jars/MyExtraJar.jar",
"--job-bookmark-option": "job-bookmark-disable",
"--job-language": "scala",
"--spark-event-logs-path": "s3://aws-glue-assets-111122223333-us-east-1/sparkHistoryLogs/"
},
"MaxRetries": 0,
"Timeout": 2880,
"WorkerType": "G.1X",
"NumberOfWorkers": 10,
"GlueVersion": "2.0"
}
使用cli
aws glue create-job --cli-input-json file://myFolder/glue_job_props.json
显然,有一种方法可以在资源“aws_glue_job”中指定Terraform的最大并发运行数,如下所示:
execution_property {
max_concurrent_runs = 5
}
可能是这样解决的。我会试着用它