使用Python创建动态任务
我正在尝试创建一个动态工作流。 我有这个: 我尝试使用BashOperator(调用python脚本)动态创建任务 我的达格:使用Python创建动态任务,python,airflow,Python,Airflow,我正在尝试创建一个动态工作流。 我有这个: 我尝试使用BashOperator(调用python脚本)动态创建任务 我的达格: import datetime as dt from airflow import DAG import shutil import os from airflow.operators.bash_operator import BashOperator from airflow.operators.python_operator import PythonOperat
import datetime as dt
from airflow import DAG
import shutil
import os
from airflow.operators.bash_operator import BashOperator
from airflow.operators.python_operator import PythonOperator, BranchPythonOperator
from airflow.operators.dummy_operator import DummyOperator
from airflow.contrib.sensors.file_sensor import FileSensor
from airflow.operators.dagrun_operator import TriggerDagRunOperator
scriptAirflow = '/home/alexw/scriptAirflow/'
uploadPath='/apps/lv-manuf2020-data/80_DATA/00_Loading/'
receiptPath= '/apps/lv-manuf2020-data/80_DATA/01_Receipt/'
fmsFiles=[]
memFiles=[]
def onlyCsvFiles():
if(os.listdir(uploadPath)):
for files in os.listdir(uploadPath):
if(files.startswith('MEM') and files.endswith('.csv') or files.startswith('FMS') and files.endswith('.csv')):
shutil.move(uploadPath+files, receiptPath)
print(files+' moved in ' + receiptPath+files)
for files in os.listdir(receiptPath):
if(files.startswith('MEM') and files.endswith('.csv') or files.startswith('FMS') and files.endswith('.csv')):
return "run_scripts"
else:
return "no_script"
else:
print('No file in upload_00')
default_args = {
'owner': 'manuf2020',
'start_date': dt.datetime(2020, 2, 17),
'retries': 1,
}
dag = DAG('lv-manuf2020', default_args=default_args, description='airflow_manuf2020',
schedule_interval=None, catchup=False)
file_sensor = FileSensor(
task_id="file_sensor",
filepath=uploadPath,
fs_conn_id='airflow_db',
poke_interval=10,
dag=dag,
)
move_csv = BranchPythonOperator(
task_id='move_csv',
python_callable=onlyCsvFiles,
trigger_rule='none_failed',
dag=dag,
)
run_scripts = DummyOperator(
task_id="run_scripts",
dag=dag
)
no_script= TriggerDagRunOperator(
task_id='no_script',
trigger_dag_id='lv-manuf2020',
trigger_rule='all_done',
dag=dag,
)
if os.listdir(receiptPath):
for files in os.listdir(receiptPath):
if files.startswith('FMS') and files.endswith('.csv'):
fmsFiles.append(files)
if files.startswith('MEM') and files.endswith('.csv'):
memFiles.append(files)
else:
pass
for files in fmsFiles:
run_Fms_Script = BashOperator(
task_id="fms_script_"+files,
bash_command='python3 '+scriptAirflow+'fmsScript.py "{{ execution_date }}"',
dag=dag,
)
rerun_dag=TriggerDagRunOperator(
task_id='rerun_dag',
trigger_dag_id='lv-manuf2020',
trigger_rule='none_failed',
dag=dag,
)
run_scripts.set_downstream(run_Fms_Script)
rerun_dag.set_upstream(run_Fms_Script)
for files in memFiles:
run_Mem_Script = BashOperator(
task_id="mem_script_"+files,
bash_command='python3 '+scriptAirflow+'memShScript.py "{{ execution_date }}"',
dag=dag,
)
rerun_dag=TriggerDagRunOperator(
task_id='rerun_dag',
trigger_dag_id='lv-manuf2020',
trigger_rule='none_failed',
dag=dag,
)
run_scripts.set_downstream(run_Mem_Script)
rerun_dag.set_upstream(run_Mem_Script)
move_csv.set_upstream(file_sensor)
run_scripts.set_upstream(move_csv)
no_script.set_upstream(move_csv)
它不像我想的那样工作。在这个循环中,它调用了一个Python脚本,该脚本将启动一个Sh脚本。它正在创建任务,但在重新运行dag而不启动脚本后,它会立即执行
for files in memFiles:
run_Mem_Script = BashOperator(
task_id="mem_script_"+files,
bash_command='python3 '+scriptAirflow+'memShScript.py "{{ execution_date }}"',
dag=dag,
)
rerun_dag=TriggerDagRunOperator(
task_id='rerun_dag',
trigger_dag_id='lv-manuf2020',
trigger_rule='none_failed',
dag=dag,
)
run_scripts.set_downstream(run_Mem_Script)
rerun_dag.set_upstream(run_Mem_Script)
有人能告诉我,如果有必要,如何使用BashOperator并行创建动态任务(因为我这样调用python脚本)
我需要像这样的东西
文件传感器>>移动csv>>运行脚本>>动态任务>>重新运行dag创建dag文件时所有代码只运行一次,只有
onlyCsvFiles
函数作为任务的一部分定期运行。Airflow导入python文件,该文件运行解释器并在DAG的原始.py文件旁边创建.pyc文件,由于代码不变,Airflow将不会再次运行DAG的代码,并在下次导入时始终使用相同的.pyc文件 .pyc文件是在导入.py文件时由Python解释器创建的。
为了添加或更改DAG的任务,必须创建一个进程,该进程定期运行解释器并更新.pyc文件。
有几种方法可以做到这一点,最好的方法是利用气流来做到这一点 我并不是建议用其他方法来创建动态任务,所以有了这种态度,您需要创建另一个任务来触发python文件的解释,用潜在的新任务“刷新”.pyc文件;它们在该循环内的运行时中表示:
for files in memFiles:
run_Mem_Script = BashOperator(
task_id="mem_script_"+files,
bash_command='python3 '+scriptAirflow+'memShScript.py "{{ execution_date }}"',
dag=dag,
)
rerun_dag=TriggerDagRunOperator(
task_id='rerun_dag',
trigger_dag_id='lv-manuf2020',
trigger_rule='none_failed',
dag=dag,
)
python命令触发解释并更新.pyc文件。在DAG中创建独立任务,如下所示(使用DAG的绝对路径编辑bash命令): 我不建议找到一个python函数来获取当前文件路径,因为在导入代码后,您可能会获取气流的运行路径,尽管它可能会工作 您的新代码:(我只在代码中添加了
exploration_python
任务,请记住用DAG文件的绝对路径替换/path/to/this/file.py
):
如果您有任何与exploration\u python
任务相关的运行时错误,请尝试先将cd
转到airflow的基本路径(airflow.cfg
目录),然后使用相对路径调用python3
。例如,如果气流的路径为
/home/username/afflow
,dag位于/home/username/afflow/dags/mydag.py
,则定义解释python
,如下所示:
interpret_python = BashOperator(
task_id="interpret_python",
bash_command='cd /home/username/airflow && python3 dags/mydag.py',
dag=dag,
)
最后一个代码片段就是python文件的其余部分?是的,我的dag文件的其余部分,只是放大一下,因为我的问题在哪里谢谢你的回答。我创建了Exploration_python,当我启动Dag时,Exploration会跳过所有下一个任务。如果我尝试使用bash命令删除这个。pyc?新任务应该在几分钟后更新并在您的airflow Web服务器可视化中显示,下一次Dag运行将运行它们(而不是运行Exploration_python并添加它们的当前任务)。您还可以重新启动Web服务器和计划程序以加快此过程,并且不要忘记刷新Web服务器页面。您的计划间隔是多少?事实上,我认为我的问题是另一个,在“this bash_command='python3'+scriptaiffort+'memShScript.py”中,该脚本memShScript.py调用bash脚本(带有subprocess.call),而我的问题是bashScript从未启动。Python执行得很好,但其中没有bash脚本
import datetime as dt
from airflow import DAG
import shutil
import os
from airflow.operators.bash_operator import BashOperator
from airflow.operators.python_operator import PythonOperator, BranchPythonOperator
from airflow.operators.dummy_operator import DummyOperator
from airflow.contrib.sensors.file_sensor import FileSensor
from airflow.operators.dagrun_operator import TriggerDagRunOperator
scriptAirflow = '/home/alexw/scriptAirflow/'
uploadPath='/apps/lv-manuf2020-data/80_DATA/00_Loading/'
receiptPath= '/apps/lv-manuf2020-data/80_DATA/01_Receipt/'
fmsFiles=[]
memFiles=[]
def onlyCsvFiles():
if(os.listdir(uploadPath)):
for files in os.listdir(uploadPath):
if(files.startswith('MEM') and files.endswith('.csv') or files.startswith('FMS') and files.endswith('.csv')):
shutil.move(uploadPath+files, receiptPath)
print(files+' moved in ' + receiptPath+files)
for files in os.listdir(receiptPath):
if(files.startswith('MEM') and files.endswith('.csv') or files.startswith('FMS') and files.endswith('.csv')):
return "run_scripts"
else:
return "no_script"
else:
print('No file in upload_00')
default_args = {
'owner': 'manuf2020',
'start_date': dt.datetime(2020, 2, 17),
'retries': 1,
}
dag = DAG('lv-manuf2020', default_args=default_args, description='airflow_manuf2020',
schedule_interval=None, catchup=False)
file_sensor = FileSensor(
task_id="file_sensor",
filepath=uploadPath,
fs_conn_id='airflow_db',
poke_interval=10,
dag=dag,
)
move_csv = BranchPythonOperator(
task_id='move_csv',
python_callable=onlyCsvFiles,
trigger_rule='none_failed',
dag=dag,
)
run_scripts = DummyOperator(
task_id="run_scripts",
dag=dag
)
no_script= TriggerDagRunOperator(
task_id='no_script',
trigger_dag_id='lv-manuf2020',
trigger_rule='all_done',
dag=dag,
)
interpret_python = BashOperator(
task_id="interpret_python",
bash_command='python3 /path/to/this/file.py',
dag=dag,
)
if os.listdir(receiptPath):
for files in os.listdir(receiptPath):
if files.startswith('FMS') and files.endswith('.csv'):
fmsFiles.append(files)
if files.startswith('MEM') and files.endswith('.csv'):
memFiles.append(files)
else:
pass
for files in fmsFiles:
run_Fms_Script = BashOperator(
task_id="fms_script_"+files,
bash_command='python3 '+scriptAirflow+'fmsScript.py "{{ execution_date }}"',
dag=dag,
)
rerun_dag=TriggerDagRunOperator(
task_id='rerun_dag',
trigger_dag_id='lv-manuf2020',
trigger_rule='none_failed',
dag=dag,
)
run_scripts.set_downstream(run_Fms_Script)
rerun_dag.set_upstream(run_Fms_Script)
for files in memFiles:
run_Mem_Script = BashOperator(
task_id="mem_script_"+files,
bash_command='python3 '+scriptAirflow+'memShScript.py "{{ execution_date }}"',
dag=dag,
)
rerun_dag=TriggerDagRunOperator(
task_id='rerun_dag',
trigger_dag_id='lv-manuf2020',
trigger_rule='none_failed',
dag=dag,
)
run_scripts.set_downstream(run_Mem_Script)
rerun_dag.set_upstream(run_Mem_Script)
move_csv.set_upstream(file_sensor)
run_scripts.set_upstream(move_csv)
no_script.set_upstream(move_csv)
interpret_python = BashOperator(
task_id="interpret_python",
bash_command='cd /home/username/airflow && python3 dags/mydag.py',
dag=dag,
)