Python 如何使用AWS GlueOperator触发粘合作业
我的脚本只有一个任务来触发粘合作业。我能够创建DAG。下面是我的DAG代码Python 如何使用AWS GlueOperator触发粘合作业,python,airflow,aws-glue,airflow-operator,Python,Airflow,Aws Glue,Airflow Operator,我的脚本只有一个任务来触发粘合作业。我能够创建DAG。下面是我的DAG代码 from airflow import DAG from airflow.operators.email_operator import EmailOperator from airflow.providers.amazon.aws.operators.glue import AwsGlueJobOperator from datetime import datetime, timedelta ### glue jo
from airflow import DAG
from airflow.operators.email_operator import EmailOperator
from airflow.providers.amazon.aws.operators.glue import AwsGlueJobOperator
from datetime import datetime, timedelta
### glue job specific variables
glue_job_name = "my_glue_job"
glue_iam_role = "AWSGlueServiceRole"
region_name = "us-west-2"
email_recipient = "me@gmail.com"
default_args = {
'owner': 'me',
'start_date': datetime(2020, 1, 1),
'retry_delay': timedelta(minutes=5),
'email': email_recipient,
'email_on_failure': True
}
with DAG(dag_id = 'glue_af_pipeline', default_args = default_args, schedule_interval = None) as dag:
glue_job_step = AwsGlueJobOperator(
job_name =glue_job_name,
script_location = 's3://my-s3-location',
region_name = region_name,
iam_role_name = glue_iam_role,
script_args=None,
num_of_dpus=10,
task_id = 'glue_job_step',
dag = dag
)
glue_job_step
当我运行DAG时,它失败并给出以下错误:
[2020-10-13 08:27:14315]{logging_mixin.py:112}INFO-[2020-10-13]
08:27:14315]{glue.py:114}错误-无法运行aws glue作业,错误:
参数验证失败:参数参数的类型无效,
值:[],类型:,有效类型:
[2020-10-13 08:27:14315]{taskinstance.py:1058}错误-参数
验证失败:参数参数的类型无效,值:[],
类型:,有效类型:回溯(most)
最近调用(最后一次):文件
“/usr/local/lib/python3.8/site packages/aiffort/models/taskinstance.py”,
第930行,运行原始任务
result=task_copy.execute(context=context)文件“/usr/local/lib/python3.8/site-packages/aiffort/providers/amazon/aws/operators/glue.py”,
执行中的第115行
glue\u job\u run=glue\u job.initialize\u job(self.script\u args)文件“/usr/local/lib/python3.8/site packages/aiffort/providers/amazon/aws/hooks/glue.py”,
第111行,在初始化作业中
job\u run=glue\u client.start\u job\u run(JobName=job\u name,Arguments=script\u Arguments)文件
“/usr/local/lib/python3.8/site packages/botocore/client.py”,第337行,
in_api_调用
返回self.\u make\u api\u call(操作名称,kwargs)文件“/usr/local/lib/python3.8/site packages/botocore/client.py”,第628行,
在"make"api"调用中
请求dict=self.\u将请求转换为请求dict(文件“/usr/local/lib/python3.8/site packages/botocore/client.py”,第676行,
在"转换"到"请求"目录中
request_dict=self._serializer.serialize_to_request(文件“/usr/local/lib/python3.8/site packages/botocore/validate.py”,第行
297,在序列化_到_请求中
raise ParamValidationError(report=report.generate_report())botocore.exceptions.ParamValidationError:参数验证失败:
参数参数的类型无效,值:[],类型:,
有效类型:[2020-10-13 08:27:14316]
{taskinstance.py:1089}INFO-将任务标记为失败
非常感谢您的建议。如果您正在运行现有的
GlueJob
请尝试以下操作
glue_job_step = AwsGlueJobOperator(
task_id = "glue_job_step",
job_name = glue_job_name,
job_desc = f"triggering glue job {glue_job_name}",
region_name = region_name,
iam_role_name = glue_iam_role,
num_of_dpus = 1,
dag = dag
)
如果没有输入参数,请删除
script\u args
。如果正在运行现有的GlueJob
请尝试此操作
glue_job_step = AwsGlueJobOperator(
task_id = "glue_job_step",
job_name = glue_job_name,
job_desc = f"triggering glue job {glue_job_name}",
region_name = region_name,
iam_role_name = glue_iam_role,
num_of_dpus = 1,
dag = dag
)
如果没有输入参数,请删除脚本参数