Python 2.7 Python 2.7和GCP Google BigQuery:摘录-压缩不起作用
我使用的是python 2.7(现在无法更改)和Google.cloud.bigquery的Google python客户端库v0.28,而compression=“GZIP”或“NONE”参数/设置似乎对我不起作用,其他人可以尝试一下,让我知道它是否对他们有效吗 在下面的代码中,你可以看到我一直在玩这个,但每次在GCS上,我的文件似乎都是非压缩的,无论我使用什么压缩 注意:我的导入用于更大的代码集,而不是此代码段所需的全部Python 2.7 Python 2.7和GCP Google BigQuery:摘录-压缩不起作用,python-2.7,google-bigquery,Python 2.7,Google Bigquery,我使用的是python 2.7(现在无法更改)和Google.cloud.bigquery的Google python客户端库v0.28,而compression=“GZIP”或“NONE”参数/设置似乎对我不起作用,其他人可以尝试一下,让我知道它是否对他们有效吗 在下面的代码中,你可以看到我一直在玩这个,但每次在GCS上,我的文件似乎都是非压缩的,无论我使用什么压缩 注意:我的导入用于更大的代码集,而不是此代码段所需的全部 相关链接: 我肯定我在做傻事,谢谢你的帮助…Rich
相关链接: 我肯定我在做傻事,谢谢你的帮助…Rich
编辑如下
为了分享的利益,以下是我认为我们最终的代码将是……丰富的
# export a table from bq into a file on gcs,
# the destination should look like the following, with no brackets {}
# gs://{bucket-name-here}/{file-name-here}
def export_data_to_gcs(dataset_name, table_name, destination,
field_delimiter=",", print_header=None,
destination_format="CSV", compression="GZIP", project=None):
try:
bigquery_client = bigquery.Client(project=project)
dataset_ref = bigquery_client.dataset(dataset_name)
table_ref = dataset_ref.table(table_name)
job_id_prefix = "bqTools_export_job"
job_config = bigquery.ExtractJobConfig()
# default is ","
if field_delimiter:
job_config.field_delimiter = field_delimiter
# default is true
if print_header:
job_config.print_header = print_header
# CSV, NEWLINE_DELIMITED_JSON, or AVRO
if destination_format:
job_config.destination_format = destination_format
# GZIP or NONE
if compression:
job_config.compression = compression
# if it should be compressed, make sure there is a .gz on the filename, add if needed
if compression == "GZIP":
if destination.lower()[-3:] != ".gz":
destination = str(destination) + ".gz"
job = bigquery_client.extract_table(table_ref, destination, job_config=job_config, job_id_prefix=job_id_prefix)
# job.begin()
job.result() # Wait for job to complete
returnMsg = 'Exported {}:{} to {}'.format(dataset_name, table_name, destination)
return returnMsg
except Exception as e:
errorStr = 'ERROR (export_data_to_gcs): ' + str(e)
print(errorStr)
raise
对于表提取,您应该使用ExtractJobConfig它不应该是
bigquery.ExtractJobConfig
而不是bigquery.LoadJobConfig
?这太好了,非常感谢-我想我昨天没有喝足够的咖啡。再次感谢Daria&Graham
# export a table from bq into a file on gcs,
# the destination should look like the following, with no brackets {}
# gs://{bucket-name-here}/{file-name-here}
def export_data_to_gcs(dataset_name, table_name, destination,
field_delimiter=",", print_header=None,
destination_format="CSV", compression="GZIP", project=None):
try:
bigquery_client = bigquery.Client(project=project)
dataset_ref = bigquery_client.dataset(dataset_name)
table_ref = dataset_ref.table(table_name)
job_id_prefix = "bqTools_export_job"
job_config = bigquery.ExtractJobConfig()
# default is ","
if field_delimiter:
job_config.field_delimiter = field_delimiter
# default is true
if print_header:
job_config.print_header = print_header
# CSV, NEWLINE_DELIMITED_JSON, or AVRO
if destination_format:
job_config.destination_format = destination_format
# GZIP or NONE
if compression:
job_config.compression = compression
# if it should be compressed, make sure there is a .gz on the filename, add if needed
if compression == "GZIP":
if destination.lower()[-3:] != ".gz":
destination = str(destination) + ".gz"
job = bigquery_client.extract_table(table_ref, destination, job_config=job_config, job_id_prefix=job_id_prefix)
# job.begin()
job.result() # Wait for job to complete
returnMsg = 'Exported {}:{} to {}'.format(dataset_name, table_name, destination)
return returnMsg
except Exception as e:
errorStr = 'ERROR (export_data_to_gcs): ' + str(e)
print(errorStr)
raise