Google cloud platform 气流:Google CloudStorageToBigQueryOperator错误
试图通过google CloudStorageToBigQueryOperator操作员将数据从google云存储加载到bigquery中 下面是一个错误。需要关于以下错误的建议 代码: 日志:Google cloud platform 气流:Google CloudStorageToBigQueryOperator错误,google-cloud-platform,google-bigquery,google-cloud-storage,airflow,Google Cloud Platform,Google Bigquery,Google Cloud Storage,Airflow,试图通过google CloudStorageToBigQueryOperator操作员将数据从google云存储加载到bigquery中 下面是一个错误。需要关于以下错误的建议 代码: 日志: [2021-06-10 09:56:54522]{taskinstance.py:902}INFO-在2021-06-10T09:55:01.281248+00:00执行 [2021-06-10 09:56:54599]{standard_task_runner.py:54}INFO-启动进程13009
[2021-06-10 09:56:54522]{taskinstance.py:902}INFO-在2021-06-10T09:55:01.281248+00:00执行
[2021-06-10 09:56:54599]{standard_task_runner.py:54}INFO-启动进程13009以运行任务
[2021-06-10 09:56:54854]{standard_task_runner.py:77}运行信息:['aiffair','run','mysql_to_gcs_data_dag','flow_name_load_to_bq','2021-06-10T09:55:01.281248+00:00','job_id','19338','pool','default pool','default___pool','raw','raw','sd','DAGS_FOLDER/mysql_gcs__bq_bq_bq_bq(bq)ravu','ravu.cfg',soutmu soutmu pau路径']
[2021-06-10 09:56:54860]{standard_task_runner.py:78}信息-作业19338:子任务流_name_load_to_bq
[2021-06-10 09:56:56025]{logging_mixin.py:112}信息-在主机上运行airflow-worker-567675b8f5-t58ns
[2021-06-10 09:56:56834]{gcp_api_base_hook.py:145}信息-使用`google.auth.default()`获取连接,因为没有为hook定义密钥文件。
您的问题似乎与GCP连接有关,而不是操作员本身
使用GCP进行身份验证有三种方法:
GoogleCloudStorageToBigQueryOperator
不推荐使用。您应该将GCSToBigQueryOperator导入为:
from airflow.providers.google.cloud.transfers.gcs_to_bigquery import GCSToBigQueryOperator
对于大于等于2.0.0的气流:
安装:
安装后,您可以按照上的说明,在上面列出的任何选项中设置连接
对于小于2.0.0的气流:
安装:
安装后,您可以按照上的说明,使用上面列出的任何选项设置连接。您的bigquery\u conn\u id
和google\u cloud\u storage\u conn\u id
似乎使用自定义连接。您是否为他们分配了正确的密钥文件、范围、项目id等,如中所示?
[2021-06-10 09:56:54,522] {taskinstance.py:902} INFO - Executing <Task(GoogleCloudStorageToBigQueryOperator): flow_name_load_into_bq> on 2021-06-10T09:55:01.281248+00:00
[2021-06-10 09:56:54,599] {standard_task_runner.py:54} INFO - Started process 13009 to run task
[2021-06-10 09:56:54,854] {standard_task_runner.py:77} INFO - Running: ['airflow', 'run', 'mysql_to_gcs_data_dag', 'flow_name_load_into_bq', '2021-06-10T09:55:01.281248+00:00', '--job_id', '19338', '--pool', 'default_pool', '--raw', '-sd', 'DAGS_FOLDER/mysql_gcs_bq_poc_sourav.py', '--cfg_path', '/tmp/tmpypa_dgaw']
[2021-06-10 09:56:54,860] {standard_task_runner.py:78} INFO - Job 19338: Subtask flow_name_load_into_bq
[2021-06-10 09:56:56,025] {logging_mixin.py:112} INFO - Running <TaskInstance: mysql_to_gcs_data_dag.flow_name_load_into_bq 2021-06-10T09:55:01.281248+00:00 [running]> on host airflow-worker-567675b8f5-t58ns
[2021-06-10 09:56:56,834] {gcp_api_base_hook.py:145} INFO - Getting connection using `google.auth.default()` since no key file is defined for hook.
from airflow.providers.google.cloud.transfers.gcs_to_bigquery import GCSToBigQueryOperator
pip install apache-airflow-providers-google
pip install apache-airflow-backport-providers-google