Google cloud platform BigQuery如何读取Google云存储中拼花地板文件的模式？_Google Cloud Platform_Google Bigquery

Google cloud platform BigQuery如何读取Google云存储中拼花地板文件的模式？

google-cloud-platform google-bigquery

Google cloud platform BigQuery如何读取Google云存储中拼花地板文件的模式？,google-cloud-platform,google-bigquery,Google Cloud Platform,Google Bigquery,我这样问是因为从拼花文件加载BigQuery表时出错，这让我认为它读取某些字段的模式不正确我正在尝试从cloudShell将parquet文件加载到bigQuery： loc1=gs://our-data/thisTable/model=firstmodel bq --location=US load --noreplace --source_format=PARQUET our-data:theSchema.theTable $loc1/*.parquet ./ourSchema.json

我这样问是因为从拼花文件加载BigQuery表时出错，这让我认为它读取某些字段的模式不正确

我正在尝试从cloudShell将parquet文件加载到bigQuery：

loc1=gs://our-data/thisTable/model=firstmodel

bq --location=US load --noreplace --source_format=PARQUET our-data:theSchema.theTable $loc1/*.parquet ./ourSchema.json

loc1中引用的目录中约有30个拼花文件。我得到一个指向以下特定文件之一的错误：

    BigQuery error in load operation: Error processing job 'our-data:bqjob_re73397ea395b9fd_0000016ae66ab746_1': Error while reading
data, error message: Provided schema is not compatible with the file 'part-00000-20b9e343-460b-44a8-b083-4437284d6771.c000.snappy.parquet'.
Field 'dataend' is specified as NULLABLE in provided schema which does not match REQUIRED as specified in the file.

但是，当我通过spark访问拼花地板文件并运行printSchema时，该字段显示为可空：

根|-row_id:long nullable=true |-row_name:string nullable=true |-dataend:string nullable=true

BigQuery表上的模式是可以为空的，模式JSON的相应部分也是可以为空的：

如果您能帮助我了解下一步的位置，我将不胜感激。

当Spark SQL将拼花地板文件写入时，出于兼容性原因，会将所有列都设置为null

如果在原始文件中设置了REQUIRED，则可以使用来检查拼花地板文件本身