Google cloud platform 查询Bigquery重复字段_Google Cloud Platform_Google Bigquery

Google cloud platform 查询Bigquery重复字段

google-cloud-platform google-bigquery

Google cloud platform 查询Bigquery重复字段,google-cloud-platform,google-bigquery,Google Cloud Platform,Google Bigquery,下面是我的BigQuery表的模式。我正在选择句子id、store和BU模型，并将数据插入BigQuery中的另一个表中。生成的新表的数据类型分别为整数、重复和重复。我想展平/取消测试重复的字段，以便在第二个表中将它们创建为字符串字段。如何使用标准sql实现这一点 +- sentences: record (repeated) | |- sentence_id: integer

下面是我的BigQuery表的模式。我正在选择句子id、store和BU模型，并将数据插入BigQuery中的另一个表中。生成的新表的数据类型分别为整数、重复和重复。我想展平/取消测试重复的字段，以便在第二个表中将它们创建为字符串字段。如何使用标准sql实现这一点

+- sentences: record (repeated)
|  |- sentence_id: integer                                                                                                                             
|  |- autodetected_language: string                                                                                                                    
|  |- processed_language: string 
|  +- attributes: record
|  |  |- agent_rating: integer
|  |  |- store: string (repeated)
|  +- classifications: record
|  |  |- BU_Model: string (repeated)

我用于创建第二个表的查询如下所示。我想以字符串列的形式查询BU_模型

SELECT sentence_id ,a.attributes.store,a.classifications.BU_Model
FROM staging_table , unnest(sentences) a

预期输出应如下所示：

暂存表：

41783851    regions     Apparel
            district    Footwear
12864656    regions
            district

41783851    regions     Apparel
41783851    regions     Footwear            
41783851    district    Apparel
41783851    district    Footwear
12864656    regions
12864656    district

最终目标表：

41783851    regions     Apparel
            district    Footwear
12864656    regions
            district

41783851    regions     Apparel
41783851    regions     Footwear            
41783851    district    Apparel
41783851    district    Footwear
12864656    regions
12864656    district

我尝试了下面的查询，它似乎像预期的那样工作，但这意味着我必须取消对每个预期重复字段的测试。Bigquery中的My表有50多个重复的列。有没有更简单的方法

SELECT
sentence_id,
flattened_stores,
flattened_Model
FROM `staging`  
left join unnest(sentences) a
left join unnest(a.attributes.store) as flattened_stores
left join unnest(a.classifications.BU_Model) as flattened_Model

假设您希望输出中仍然有三列—数组被展平为字符串

SELECT sentence_id , 
  ARRAY_TO_STRING(a.attributes.store, ',') store,
  ARRAY_TO_STRING(a.classifications.BU_Model, ',') BU_Model
FROM staging_table , unnest(sentences) a

更新以解决最近的变化问题

在BigQuery标准SQL中，使用

LEFT JOIN UNNEST（）

（正如您在上一次查询中所做的那样）是实现结果的最合理方法

在BigQueryLegacy SQL中，您可以使用

flatte

语法，但它也有同样的缺点，需要对所有50多列重复相同的语法

非常简单的示例：

#legacySQL
SELECT sentence_id, store, BU_Model
FROM (FLATTEN([project:dataset.stage], BU_Model))

结论：我将使用

LEFT JOIN UNNEST（）

方法