Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/user-interface/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何运行BigQuery查询,然后将输出CSV发送到Apache中的Google云存储?_Python_Google Bigquery_Airflow - Fatal编程技术网

Python 如何运行BigQuery查询,然后将输出CSV发送到Apache中的Google云存储?

Python 如何运行BigQuery查询,然后将输出CSV发送到Apache中的Google云存储?,python,google-bigquery,airflow,Python,Google Bigquery,Airflow,我需要用python运行一个bigquery脚本,它需要在google云存储中作为CSV输出。目前,我的脚本触发大查询代码并直接保存到我的PC 但是,我需要让它在气流中运行,这样我就不会有任何本地依赖项 我当前的脚本将输出保存到本地计算机,然后我必须将其移动到GCS。我在网上查过了,我想不出来。ps我对python非常陌生,所以如果之前有人问过这个问题,我很抱歉 import pandas as pd from googleapiclient import discovery from oaut

我需要用python运行一个bigquery脚本,它需要在google云存储中作为CSV输出。目前,我的脚本触发大查询代码并直接保存到我的PC

但是,我需要让它在气流中运行,这样我就不会有任何本地依赖项

我当前的脚本将输出保存到本地计算机,然后我必须将其移动到GCS。我在网上查过了,我想不出来。ps我对python非常陌生,所以如果之前有人问过这个问题,我很抱歉

import pandas as pd
from googleapiclient import discovery
from oauth2client.client import GoogleCredentials

def run_script():

    df = pd.read_gbq('SELECT * FROM `table/veiw` LIMIT 15000',
                 project_id='PROJECT',
                 dialect='standard'
                 )

    df.to_csv('XXX.csv', index=False)

def copy_to_gcs(filename, bucket, destination_filename):

    credentials = GoogleCredentials.get_application_default()
    service = discovery.build('storage', 'v1', credentials=credentials)

    body = {'name': destination_filename}
    req = service.objects().insert(bucket=bucket,body=body, media_body=filename)
    resp = req.execute()

current_date = datetime.date.today()
filename = (r"C:\Users\LOCALDRIVE\ETC\ETC\ETC.csv")
bucket = 'My GCS BUCKET'

str_prefix_datetime = datetime.datetime.now().strftime('%Y%m%d_%H%M%S')
destfile = 'XXX' + str_prefix_datetime + '.csv'
print('')

    ```

气流为使用BigQuery提供了多个操作符

对BigQuery执行查询。 将BigQuery表(如查询的目标表)导出到GCS。 您可以看到一个运行查询的示例,后面是


我建议您只使用。另外,要考虑到您支持导出。回答得很好,但是如果表大小超过1 GB,气流操作员将向您抛出一个错误。如果表大小超过1 GB,您将如何处理此问题?谢谢Advance@sethu这可能是另一个问题的一个很好的候选人。请包括气流抛出的错误。您好Tim Swast谢谢您的即时回复。我正在使用bq_to_gcs操作符将bigquery表数据拉到gcs,表大小超过1GB,并出现如下所示的错误bigquery作业失败。最后一个错误是:{'reason':'invalid','message':'Table dataset_reference{\n project_reference{\n project_id:********\n gaia_id:***********\n}\n dataset_id:***********\n dataset_uuid:***\n}\ntable_id:***\ntable_uuid:*******太大,无法导出到单个文件。请指定包含*到碎片导出的uri。您是对的,BigQuery将CSV输出文件大小限制为每个文件1GB,但您可以通过在文件名中包含*字符来指定提取作业的文件名模板。BQ将*扩展到页面指示符,如00000000000 1.请看:再次感谢。但这是否适用于空气流量bq_to_gcs操作员?
# Copyright 2018 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Query recent StackOverflow questions.

bq_recent_questions_query = bigquery_operator.BigQueryOperator(
    task_id='bq_recent_questions_query',
    sql="""
    SELECT owner_display_name, title, view_count
    FROM `bigquery-public-data.stackoverflow.posts_questions`
    WHERE creation_date < CAST('{max_date}' AS TIMESTAMP)
        AND creation_date >= CAST('{min_date}' AS TIMESTAMP)
    ORDER BY view_count DESC
    LIMIT 100
    """.format(max_date=max_query_date, min_date=min_query_date),
    use_legacy_sql=False,
    destination_dataset_table=bq_recent_questions_table_id)

# Export query result to Cloud Storage.
export_questions_to_gcs = bigquery_to_gcs.BigQueryToCloudStorageOperator(
    task_id='export_recent_questions_to_gcs',
    source_project_dataset_table=bq_recent_questions_table_id,
    destination_cloud_storage_uris=[output_file],
    export_format='CSV')