Amazon web services 在没有Lambda的情况下,是否可以通过AWS Step函数为AWS EMR执行Step并发?
这是我的场景,我尝试创建4个AWS EMR集群,其中每个集群将分配2个作业,因此它将类似于4个集群,使用Step函数协调8个作业 我的流程应该是这样的: 4个集群将同时启动,并行运行8个作业,其中每个集群将并行运行2个作业 现在,AWS最近启动了此功能,使用EMR中的StepConcurrencyLevel在单个集群中同时运行2(或)个以上的作业,以减少集群的运行时间,这可以使用EMR控制台、AWS CLI(或)甚至通过AWS lambda来执行 但是,我想使用AWS Step函数和它的状态机语言在一个集群中并行启动两个(或多个)作业,就像这里提到的格式一样 我尝试过引用许多站点来执行这个过程,在那里我通过控制台(或)通过AWS lambda中的boto3格式获得了解决方案,但我找不到通过Step函数本身执行这个过程的解决方案 有什么解决办法吗Amazon web services 在没有Lambda的情况下,是否可以通过AWS Step函数为AWS EMR执行Step并发?,amazon-web-services,amazon-emr,aws-step-functions,Amazon Web Services,Amazon Emr,Aws Step Functions,这是我的场景,我尝试创建4个AWS EMR集群,其中每个集群将分配2个作业,因此它将类似于4个集群,使用Step函数协调8个作业 我的流程应该是这样的: 4个集群将同时启动,并行运行8个作业,其中每个集群将并行运行2个作业 现在,AWS最近启动了此功能,使用EMR中的StepConcurrencyLevel在单个集群中同时运行2(或)个以上的作业,以减少集群的运行时间,这可以使用EMR控制台、AWS CLI(或)甚至通过AWS lambda来执行 但是,我想使用AWS Step函数和它的状态机语
提前感谢..所以,我又浏览了几个网站,找到了解决问题的方法 我面临的问题是StepConcurrentyLevel,我可以使用AWS控制台(或)通过AWS CLI(或)甚至通过Python使用BOTO3添加它。。。但我期待着一个使用状态机语言的解决方案,我发现了一个 我们所要做的就是在使用状态机语言创建集群时,我们必须在其中指定StepConcurrencyLevel,如2(或)3,其中默认值为1。设置完成后,在该集群下创建4个步骤并运行状态机 其中,集群将识别已设置的并发数,并相应地运行这些步骤 我的示例流程: ->my orchestration的JSON脚本
{
"StartAt": "Create_A_Cluster",
"States": {
"Create_A_Cluster": {
"Type": "Task",
"Resource": "arn:aws:states:::elasticmapreduce:createCluster.sync",
"Parameters": {
"Name": "WorkflowCluster",
"StepConcurrencyLevel": 2,
"Tags": [
{
"Key": "Description",
"Value": "process"
},
{
"Key": "Name",
"Value": "filename"
},
{
"Key": "Owner",
"Value": "owner"
},
{
"Key": "Project",
"Value": "roject"
},
{
"Key": "User",
"Value": "user"
}
],
"VisibleToAllUsers": true,
"ReleaseLabel": "emr-5.28.1",
"Applications": [
{
"Name": "Spark"
}
],
"ServiceRole": "EMR_DefaultRole",
"JobFlowRole": "EMR_EC2_DefaultRole",
"LogUri": "s3://prefix/prefix/log.txt/",
"Instances": {
"KeepJobFlowAliveWhenNoSteps": true,
"InstanceFleets": [
{
"InstanceFleetType": "MASTER",
"TargetSpotCapacity": 1,
"InstanceTypeConfigs": [
{
"InstanceType": "m4.xlarge",
"BidPriceAsPercentageOfOnDemandPrice": 90
}
]
},
{
"InstanceFleetType": "CORE",
"TargetSpotCapacity": 1,
"InstanceTypeConfigs": [
{
"InstanceType": "m4.xlarge",
"BidPriceAsPercentageOfOnDemandPrice": 90
}
]
}
]
}
},
"Retry": [
{
"ErrorEquals": [
"States.ALL"
],
"IntervalSeconds": 5,
"MaxAttempts": 1,
"BackoffRate": 2.5
}
],
"Catch": [
{
"ErrorEquals": [
"States.ALL"
],
"Next": "Fail_Cluster"
}
],
"ResultPath": "$.cluster",
"OutputPath": "$.cluster",
"Next": "Add_Steps_Parallel"
},
"Fail_Cluster": {
"Type": "Task",
"Resource": "arn:aws:states:::sns:publish",
"Parameters": {
"TopicArn": "arn:aws:sns:us-west-2:919490798061:rsac_error_notification",
"Message.$": "$.Cause"
},
"Next": "Terminate_Cluster"
},
"Add_Steps_Parallel": {
"Type": "Parallel",
"Branches": [
{
"StartAt": "Step_One",
"States": {
"Step_One": {
"Type": "Task",
"Resource": "arn:aws:states:::elasticmapreduce:addStep.sync",
"Parameters": {
"ClusterId.$": "$.ClusterId",
"Step": {
"Name": "The first step",
"ActionOnFailure": "TERMINATE_CLUSTER",
"HadoopJarStep": {
"Jar": "command-runner.jar",
"Args": [
"spark-submit",
"--deploy-mode",
"cluster",
"--master",
"yarn",
"--conf",
"spark.dynamicAllocation.enabled=true",
"--conf",
"maximizeResourceAllocation=true",
"--conf",
"spark.shuffle.service.enabled=true",
"--py-files",
"s3://prefix/prefix/pythonfile.py",
"s3://prefix/prefix/pythonfile.py"
]
}
}
},
"Retry": [
{
"ErrorEquals": [
"States.ALL"
],
"IntervalSeconds": 5,
"MaxAttempts": 1,
"BackoffRate": 2.5
}
],
"Catch": [
{
"ErrorEquals": [
"States.ALL"
],
"ResultPath": "$.err_mgs",
"Next": "Fail_SNS"
}
],
"ResultPath": "$.step1",
"Next": "Terminate_Cluster_1"
},
"Fail_SNS": {
"Type": "Task",
"Resource": "arn:aws:states:::sns:publish",
"Parameters": {
"TopicArn": "arn:aws:sns:us-west-2:919490798061:rsac_error_notification",
"Message.$": "$.err_mgs.Cause"
},
"ResultPath": "$.fail_cluster",
"Next": "Terminate_Cluster_1"
},
"Terminate_Cluster_1": {
"Type": "Task",
"Resource": "arn:aws:states:::elasticmapreduce:terminateCluster.sync",
"Parameters": {
"ClusterId.$": "$.ClusterId"
},
"End": true
}
}
},
{
"StartAt": "Step_Two",
"States": {
"Step_Two": {
"Type": "Task",
"Resource": "arn:aws:states:::elasticmapreduce:addStep",
"Parameters": {
"ClusterId.$": "$.ClusterId",
"Step": {
"Name": "The second step",
"ActionOnFailure": "TERMINATE_CLUSTER",
"HadoopJarStep": {
"Jar": "command-runner.jar",
"Args": [
"spark-submit",
"--deploy-mode",
"cluster",
"--master",
"yarn",
"--conf",
"spark.dynamicAllocation.enabled=true",
"--conf",
"maximizeResourceAllocation=true",
"--conf",
"spark.shuffle.service.enabled=true",
"--py-files",
"s3://prefix/prefix/pythonfile.py",
"s3://prefix/prefix/pythonfile.py"
]
}
}
},
"Retry": [
{
"ErrorEquals": [
"States.ALL"
],
"IntervalSeconds": 5,
"MaxAttempts": 1,
"BackoffRate": 2.5
}
],
"Catch": [
{
"ErrorEquals": [
"States.ALL"
],
"ResultPath": "$.err_mgs_1",
"Next": "Fail_SNS_1"
}
],
"ResultPath": "$.step2",
"Next": "Terminate_Cluster_2"
},
"Fail_SNS_1": {
"Type": "Task",
"Resource": "arn:aws:states:::sns:publish",
"Parameters": {
"TopicArn": "arn:aws:sns:us-west-2:919490798061:rsac_error_notification",
"Message.$": "$.err_mgs_1.Cause"
},
"ResultPath": "$.fail_cluster_1",
"Next": "Terminate_Cluster_2"
},
"Terminate_Cluster_2": {
"Type": "Task",
"Resource": "arn:aws:states:::elasticmapreduce:terminateCluster.sync",
"Parameters": {
"ClusterId.$": "$.ClusterId"
},
"End": true
}
}
}
],
"ResultPath": "$.steps",
"Next": "Terminate_Cluster"
},
"Terminate_Cluster": {
"Type": "Task",
"Resource": "arn:aws:states:::elasticmapreduce:terminateCluster.sync",
"Parameters": {
"ClusterId.$": "$.ClusterId"
},
"End": true
}
}
}
在这个脚本(或)AWS Step函数的状态机语言中,在创建集群时,我提到了StepConcurrencyLevel为2,并在集群下面添加了2个spark作业作为步骤
当我在Step函数中运行此脚本时,我能够协调集群和步骤以在集群中并发运行2个步骤,而无需通过AWS CLI(或)甚至BOTO3在AWS EMR控制台(或)中直接配置它
我只是使用状态机语言在AWS Step函数下在单个集群中并发运行2个步骤,而不需要lambda、livy API或BOTO3等其他服务的帮助
流程图如下所示:
为了更准确地说明我在上述状态机语言中插入StepConcurrentyLevel的位置,请参见:
"Create_A_Cluster": {
"Type": "Task",
"Resource": "arn:aws:states:::elasticmapreduce:createCluster.sync",
"Parameters": {
"Name": "WorkflowCluster",
"StepConcurrencyLevel": 2,
"Tags": [
{
"Key": "Description",
"Value": "process"
},
在下创建集群
谢谢。您也可以通过livy api进行spark提交,有没有不使用它的强烈理由?是的。。我有这个要求,只使用aws emr集群通过step函数的状态机语言来执行整个过程…这没有任何意义。另外,尝试使用自动缩放的单个EMR集群,而不是4个集群。我有一个用例,需要在SF的帮助下运行并发步骤。因此,我想知道几件事:1。如果SF 2状态的步骤执行延迟,对集群有什么影响。你有没有遇到过主要的阻碍者3。在SF的帮助下这样做的优点和缺点是什么实际上,我们确实有一些缺点,比如并发步骤执行不能与我们的集群配置同时运行两个步骤。。。它使用给定的配置加速第一步,第二步正在运行,但没有达到集群的全部容量。那么,关于SF和EMR集群之间的通信是否良好,没有问题…@NaveenB“ActionOnFailure”:“TERMINATE_cluster”是否适合您?当我尝试此选项时,出现错误验证异常。@NaveenB您救了我的周末,谢谢you@NaveenB如果步骤1仍在运行而步骤2失败,会发生什么情况?步骤2是否会终止\u群集\u 2会等到步骤1完成或立即终止群集?