Python ModuleNotFoundError:没有名为'；气流'；_Python_Google Cloud Platform_Airflow_Google Cloud Dataflow_Google Cloud Composer

Python ModuleNotFoundError:没有名为'；气流'；

python google-cloud-platform airflow google-cloud-dataflow

Python ModuleNotFoundError:没有名为'；气流'；,python,google-cloud-platform,airflow,google-cloud-dataflow,google-cloud-composer,Python,Google Cloud Platform,Airflow,Google Cloud Dataflow,Google Cloud Composer,我正在使用AirflowPythonOperator使用Dataflow runner执行python Beam作业。数据流作业返回错误“ModuleNotFoundError:没有名为“airflow”的模块” 在DataFlow UI中，使用Python操作符调用作业时使用的SDK版本为2.15.0。如果作业从Cloud shell执行，使用的SDK版本为2.23.0。该作业在从启动时工作贝壳 Composer的环境详细信息如下所示： Image version = composer-

我正在使用AirflowPythonOperator使用Dataflow runner执行python Beam作业。数据流作业返回错误“ModuleNotFoundError:没有名为“airflow”的模块”

在DataFlow UI中，使用Python操作符调用作业时使用的SDK版本为2.15.0。如果作业从Cloud shell执行，使用的SDK版本为2.23.0。该作业在从启动时工作贝壳

Composer的环境详细信息如下所示：

Image version = composer-1.10.3-airflow-1.10.3 Python version= 3
上一篇帖子建议使用PythonVirtualenvOperator操作符。我尝试了以下设置：

requirements=['apache-beam[gcp]'],

python_version=3

Composer返回错误

“'install'，'apache beam[gcp]']”返回非零退出状态2。“

如有任何建议，将不胜感激

这是调用数据流作业的DAG。我没有显示DAG中使用的所有函数，但保留了导入：

  import logging
    import pprint
    import json
    from airflow.operators.bash_operator import BashOperator
    from airflow.operators.python_operator import PythonOperator
    from airflow.contrib.operators.dataflow_operator import DataflowTemplateOperator
    from airflow.models import DAG
    import google.cloud.logging
    from datetime import timedelta
    from airflow.utils.dates import days_ago
    from deps import utils
    from google.cloud import storage
    from airflow.exceptions import AirflowException
    from deps import logger_montr
    from deps import dataflow_clean_csv
    
    
    
    dag = DAG(dag_id='clean_data_file',
              default_args=args,
              description='Runs Dataflow to clean csv files',
              schedule_interval=None)
    
    def get_values_from_previous_dag(**context):
        var_dict = {}
        for key, val in context['dag_run'].conf.items():
            context['ti'].xcom_push(key, val)
            var_dict[key] = val
    
    populate_ti_xcom = PythonOperator(
        task_id='get_values_from_previous_dag',
        python_callable=get_values_from_previous_dag,
        provide_context=True,
        dag=dag,
    )
    
    
    dataflow_clean_csv = PythonOperator(
        task_id = "dataflow_clean_csv",
        python_callable = dataflow_clean_csv.clean_csv_dataflow,
        op_kwargs= {
         'project': 
         'zone': 
         'region': 
         'stagingLocation':
         'inputDirectory': 
         'filename': 
         'outputDirectory':     
        },
        provide_context=True,
        dag=dag,
    )

populate_ti_xcom >> dataflow_clean_csv

我使用ti.xcom_pull（task_id='get_values_from_previous_dag'）方法来分配op_kwargs

这是正在调用的数据流作业：

import apache_beam as beam
import csv
import logging
from apache_beam.options.pipeline_options import PipelineOptions
from apache_beam.io import WriteToText


def parse_file(element):
  for line in csv.reader([element], quotechar='"', delimiter=',', quoting=csv.QUOTE_ALL):
      line = [s.replace('\"', '') for s in line]
      clean_line = '","'.join(line)
      final_line = '"'+ clean_line +'"'
      return final_line

def clean_csv_dataflow(**kwargs): 
    argv = [
           # Dataflow pipeline options 
           "--region={}".format(kwargs["region"]),
           "--project={}".format(kwargs["project"]) ,
           "--temp_location={}".format(kwargs["stagingLocation"]),
           # Setting Dataflow pipeline options  
           '--save_main_session',
           '--max_num_workers=8',
           '--autoscaling_algorithm=THROUGHPUT_BASED', 
           # Mandatory constants
           '--job_name=cleancsvdataflow',
           '--runner=DataflowRunner'     
          ]
    options = PipelineOptions(
      flags=argv
      )
      
    pipeline = beam.Pipeline(options=options)
    
    inputDirectory = kwargs["inputDirectory"]
    filename = kwargs["filename"]
    outputDirectory = kwargs["outputDirectory"]

    
    outputfile_temp = filename
    outputfile_temp = outputfile_temp.split(".")
    outputfile = "_CLEANED.".join(outputfile_temp)   

    in_path_and_filename = "{}{}".format(inputDirectory,filename)
    out_path_and_filename = "{}{}".format(outputDirectory,outputfile)
    
    pipeline = beam.Pipeline(options=options)
   

    clean_csv = (pipeline 
      | "Read input file" >> beam.io.ReadFromText(in_path_and_filename)
      | "Parse file" >> beam.Map(parse_file)
      | "writecsv" >> beam.io.WriteToText(out_path_and_filename,num_shards=1)
    )
   
    pipeline.run()

该答案由@BSpinoza在评论部分提供：

我所做的是从全局名称空间和位置移动所有

导入
将它们添加到函数定义中。然后，从我使用的呼叫DAG
操作符
。成功了
另外，推荐的方法之一是使用。
您能显示您的DAG文件吗？我建议您将Composer版本升级到最新版本，即：Composer-1.11.2-airflow-1.10.9。另外，请分享您的完整DAG文件，这是检查导入语句所需的。您好@muscat，这是DAG文件，希望对您有所帮助。谢谢你，谢谢！你能解释一下DataflowTemplateOperator的导入吗？我看不出这在你的代码中有任何用处。最初我打算使用一个模板，但后来改为我自己的梁作业。因此DataflowTemplateOperator已过时。非常感谢。