如何运行将数据加载到其他project BigQuery表中的cloud composer任务_Cloud_Airflow_Google Cloud Composer

如何运行将数据加载到其他project BigQuery表中的cloud composer任务

cloud airflow

如何运行将数据加载到其他project BigQuery表中的cloud composer任务,cloud,airflow,google-cloud-composer,Cloud,Airflow,Google Cloud Composer,我在project-A下创建了cloud composer环境，我想将数据加载到其他project-B BigQuery表中。我知道它的GCSToBigQueryOperator的任务，但它没有成功，它失败了，我想知道我如何才能做到这一点。从项目A中，我想运行一个任务，将数据加载到项目B表中。根据我的经验和对您的条件的假设，我认为您需要确保您的服务帐户（bigquery\u conn\u id和google\u cloud\u storage\u conn\u id）在两个项目中都有足够的权限

我在project-A下创建了cloud composer环境，我想将数据加载到其他project-B BigQuery表中。我知道它的GCSToBigQueryOperator的任务，但它没有成功，它失败了，我想知道我如何才能做到这一点。

从项目A中，我想运行一个任务，将数据加载到项目B表中。

根据我的经验和对您的条件的假设，我认为您需要确保您的服务帐户（

bigquery\u conn\u id

和

google\u cloud\u storage\u conn\u id

）在两个项目中都有足够的权限，正如shankshera提到的，首先在GCP IAM中检查您在cloud composer环境中使用的服务帐户是否可以访问这两个项目（以及BigQuery中的数据集）

老实说，我也不能让这个操作符为我正常工作，所以我编写了自定义python函数来做同样的事情

    from google.cloud import bigquery
def load_into_table_from_csv(**kwargs):
    
    """
    Loads data into specified BQ table from specified CSV file in GCS
    
    Receives parameters from table_path and file_path from PythonOperator in Airflow. 
    Parameters need to be explicitly specified in op_kwargs variable in the task definition

Example of op_kwargs for PythonOperator:
{'table_path':'project_id.dataset_id.table_id',
'file_path':'gs://bucket_name/file_name.csv',
'delimiter':',' ,
'quote_character':'"'}

    """
    bigquery_client = bigquery.Client()
    dataset_ref = kwargs['table_path']
    try: 
        file=eval(kwargs["file_path"])
    except:
        file=kwargs["file_path"]
    finally:
        delimiter=kwargs["delimiter"]
        quote_character=kwargs["quote_character"]

        job_config = bigquery.LoadJobConfig()
        job_config.field_delimiter = delimiter #delimeter in the source file
        job_config.skip_leading_rows = 1 #how many rows to skip (set to 1 if you have a header row)
        job_config.quote_character=quote_character
        job_config.write_disposition ='WRITE_TRUNCATE' #https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationLoad.FIELDS.write_disposition

        load_job = bigquery_client.load_table_from_uri(
            file,
            dataset_ref,
            job_config=job_config)

        assert load_job.job_type == 'load'

        load_job.result()  # Waits for table load to complete.

        assert load_job.state == 'DONE'

而在dag中，您只需使用此函数并提供参数，如下所示：

t8 = PythonOperator(
    task_id=f"load_{table_name}",
    python_callable=load_into_table_from_csv, #function that's called by the task
    op_kwargs=specs_current_table, #passing arguments into a function
    dag=dag
)

顺便说一句，我个人同意本文作者的观点，如果我们可以用普通代码做同样的事情，我们应该小心地使用许多自定义操作符