Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Airflow 气流中的多个操作员不';无法识别当前文件夹_Airflow - Fatal编程技术网

Airflow 气流中的多个操作员不';无法识别当前文件夹

Airflow 气流中的多个操作员不';无法识别当前文件夹,airflow,Airflow,我正在使用Airflow查看我是否可以对数据摄取做相同的工作,原始摄取在shell中通过两个步骤完成: (base) (venv) [pchoix@hadoop02 ~]$ python Python 2.6.6 (r266:84292, Jan 22 2014, 09:42:36) cd~/bm3 ./bm3.py runjob-p projectd-j jobid 在“气流”中,我使用BashOperator执行两项任务: task1 = BashOperator( task_id

我正在使用Airflow查看我是否可以对数据摄取做相同的工作,原始摄取在shell中通过两个步骤完成:

(base) (venv) [pchoix@hadoop02 ~]$ python
Python 2.6.6 (r266:84292, Jan 22 2014, 09:42:36)
  • cd~/bm3
  • ./bm3.py runjob-p projectd-j jobid
  • 在“气流”中,我使用BashOperator执行两项任务:

    task1 = BashOperator(
        task_id='switch2BMhome',
        bash_command="cd /home/pchoix/bm3",
        dag=dag)
    
    task2 = BashOperator(
        task_id='kickoff_bm3',
        bash_command="./bm3.py runjob -p client1 -j ingestion",
        dag=dag)
    
    task1 >> task2
    
    任务1按预期完成,记录如下:

    [2019-03-01 16:50:17,638] {bash_operator.py:100} INFO - Temporary script location: /tmp/airflowtmpkla8w_xd/switch2ALhomeelbcfbxb
    [2019-03-01 16:50:17,638] {bash_operator.py:110} INFO - Running command: cd /home/rxie/al2
    
    由于日志中显示的原因,task2失败:

    [2019-03-01 16:51:19,896] {bash_operator.py:100} INFO - Temporary script location: /tmp/airflowtmp328cvywu/kickoff_al2710f17lm
    [2019-03-01 16:51:19,896] {bash_operator.py:110} INFO - Running command: ./bm32.py runjob -p client1 -j ingestion
    [2019-03-01 16:51:19,902] {bash_operator.py:119} INFO - Output:
    [2019-03-01 16:51:19,903] {bash_operator.py:123} INFO - /tmp/airflowtmp328cvywu/kickoff_al2710f17lm: line 1: ./bm3.py: No such file or directory
    
    因此,似乎每个任务都是从一个看似唯一的临时文件夹中执行的,而第二个任务失败了

    如何从特定位置运行bash命令

    如果你能在这里分享任何想法,我将不胜感激

    多谢各位

    更新: 谢谢你的建议,几乎奏效了

    bash_命令=“cd/home/pchoix/bm3&./bm3.py runjob-p client1-j摄取”
    首先工作正常,但是
    runjob
    中有多个任务,第一个任务工作,第二个任务调用impala-shell.py来运行某些东西,impala-shell.py在它之外指定python2作为它的解释器语言,其他部分使用python 3

    当我只是在shell中运行bash_命令时,这是正常的,但是在气流中,由于未知的原因,尽管我在shell中设置了正确的路径并确保:

    (base) (venv) [pchoix@hadoop02 ~]$ python
    Python 2.6.6 (r266:84292, Jan 22 2014, 09:42:36)
    
    该任务仍在python 3中执行,并使用python 3,从日志中可以看到:

    [2019-03-01 21:42:08,040] {bash_operator.py:123} INFO -   File "/data/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/bin/../lib/impala-shell/impala_shell.py", line 220
    [2019-03-01 21:42:08,040] {bash_operator.py:123} INFO -     print '\tNo options available.'
    [2019-03-01 21:42:08,040] {bash_operator.py:123} INFO -                                   ^
    [2019-03-01 21:42:08,040] {bash_operator.py:123} INFO - SyntaxError: Missing parentheses in call to 'print'
    
    注意:在shell环境中运行作业时,此问题不存在:

    ./bm3.py runjob -p client1 -j ingestion
    
    那么:

    task = BashOperator(
        task_id='switch2BMhome',
        bash_command="cd /home/pchoix/bm3 && ./bm3.py runjob -p client1 -j ingestion",
        dag=dag)