Python 2.7 Python 2.7和GCP Google BigQuery:摘录-压缩不起作用

Python 2.7 Python 2.7和GCP Google BigQuery:摘录-压缩不起作用,python-2.7,google-bigquery,Python 2.7,Google Bigquery,我使用的是python 2.7(现在无法更改)和Google.cloud.bigquery的Google python客户端库v0.28,而compression=“GZIP”或“NONE”参数/设置似乎对我不起作用,其他人可以尝试一下,让我知道它是否对他们有效吗 在下面的代码中,你可以看到我一直在玩这个,但每次在GCS上,我的文件似乎都是非压缩的,无论我使用什么压缩 注意:我的导入用于更大的代码集,而不是此代码段所需的全部 相关链接: 我肯定我在做傻事,谢谢你的帮助…Rich

我使用的是python 2.7(现在无法更改)和Google.cloud.bigquery的Google python客户端库v0.28,而compression=“GZIP”或“NONE”参数/设置似乎对我不起作用,其他人可以尝试一下,让我知道它是否对他们有效吗

在下面的代码中,你可以看到我一直在玩这个,但每次在GCS上,我的文件似乎都是非压缩的,无论我使用什么压缩

注意:我的导入用于更大的代码集,而不是此代码段所需的全部



相关链接:

我肯定我在做傻事,谢谢你的帮助…Rich



编辑如下


为了分享的利益,以下是我认为我们最终的代码将是……丰富的

# export a table from bq into a file on gcs,
# the destination should look like the following, with no brackets {}
# gs://{bucket-name-here}/{file-name-here}
def export_data_to_gcs(dataset_name, table_name, destination,
                       field_delimiter=",", print_header=None,
                       destination_format="CSV", compression="GZIP", project=None):
    try:
        bigquery_client = bigquery.Client(project=project)
        dataset_ref = bigquery_client.dataset(dataset_name)
        table_ref = dataset_ref.table(table_name)

        job_id_prefix = "bqTools_export_job"

        job_config = bigquery.ExtractJobConfig()

        # default is ","
        if field_delimiter:
            job_config.field_delimiter = field_delimiter

        # default is true
        if print_header:
            job_config.print_header = print_header

        # CSV, NEWLINE_DELIMITED_JSON, or AVRO
        if destination_format:
            job_config.destination_format = destination_format

        # GZIP or NONE
        if compression:
            job_config.compression = compression

        # if it should be compressed, make sure there is a .gz on the filename, add if needed
        if compression == "GZIP":
            if destination.lower()[-3:] != ".gz":
                destination = str(destination) + ".gz"

        job = bigquery_client.extract_table(table_ref, destination, job_config=job_config, job_id_prefix=job_id_prefix)

        # job.begin()
        job.result()  # Wait for job to complete

        returnMsg = 'Exported {}:{} to {}'.format(dataset_name, table_name, destination)

        return returnMsg

    except Exception as e:
        errorStr = 'ERROR (export_data_to_gcs): ' + str(e)
        print(errorStr)
        raise

对于表提取,您应该使用ExtractJobConfig

它不应该是
bigquery.ExtractJobConfig
而不是
bigquery.LoadJobConfig
?这太好了,非常感谢-我想我昨天没有喝足够的咖啡。再次感谢Daria&Graham
# export a table from bq into a file on gcs,
# the destination should look like the following, with no brackets {}
# gs://{bucket-name-here}/{file-name-here}
def export_data_to_gcs(dataset_name, table_name, destination,
                       field_delimiter=",", print_header=None,
                       destination_format="CSV", compression="GZIP", project=None):
    try:
        bigquery_client = bigquery.Client(project=project)
        dataset_ref = bigquery_client.dataset(dataset_name)
        table_ref = dataset_ref.table(table_name)

        job_id_prefix = "bqTools_export_job"

        job_config = bigquery.ExtractJobConfig()

        # default is ","
        if field_delimiter:
            job_config.field_delimiter = field_delimiter

        # default is true
        if print_header:
            job_config.print_header = print_header

        # CSV, NEWLINE_DELIMITED_JSON, or AVRO
        if destination_format:
            job_config.destination_format = destination_format

        # GZIP or NONE
        if compression:
            job_config.compression = compression

        # if it should be compressed, make sure there is a .gz on the filename, add if needed
        if compression == "GZIP":
            if destination.lower()[-3:] != ".gz":
                destination = str(destination) + ".gz"

        job = bigquery_client.extract_table(table_ref, destination, job_config=job_config, job_id_prefix=job_id_prefix)

        # job.begin()
        job.result()  # Wait for job to complete

        returnMsg = 'Exported {}:{} to {}'.format(dataset_name, table_name, destination)

        return returnMsg

    except Exception as e:
        errorStr = 'ERROR (export_data_to_gcs): ' + str(e)
        print(errorStr)
        raise