Sql 如何使用BigQuery将垂直数据转换为水平数据?

Sql 如何使用BigQuery将垂直数据转换为水平数据?,sql,google-bigquery,Sql,Google Bigquery,我目前正在处理员工福利数据。然而,电子表格数据完全是一团糟。我想将其格式化为易于捕获信息的格式。 当前格式如下所示: Relationship EmployeeName BenefitCode BenefitOption Name Alice DEN EEC CHL Alice DEN EEC John SPS

我目前正在处理员工福利数据。然而,电子表格数据完全是一团糟。我想将其格式化为易于捕获信息的格式。 当前格式如下所示:

Relationship EmployeeName  BenefitCode  BenefitOption  Name  
               Alice          DEN         EEC           
  CHL          Alice          DEN         EEC          John
  SPS          Alice          MED                      Lee
               Lily           VIS                      
  SPS          Lily           VIS                       Tom
SELECT   RelationshipCode, EmployeeName, 
         MAX(IF(BenefitCode = "DEN", BenefitOptionCode , NULL)) AS DEN,
         MAX(IF(BenefitCode = "MED", BenefitOptionCode , NULL)) AS MEDICAL,
         MAX(IF(BenefitCode = "VIS", BenefitOptionCode , NULL)) AS VISION
FROM `TableXXX` 
WHERE RelationshipCode = 'Employee'
GROUP BY EmployeeName, RelationshipCode
我想这样转移它:

Relationship    Name     MED    DEN    VIS 
Employee        Alice           EEC
CHL             John            EEC
SPS             Lee      MED
Employee        Lily                   VIS
SPS             Tom                    VIS
我试图按名称和收益代码对数据进行分组,但我对此感到非常困惑

我的代码如下:

Relationship EmployeeName  BenefitCode  BenefitOption  Name  
               Alice          DEN         EEC           
  CHL          Alice          DEN         EEC          John
  SPS          Alice          MED                      Lee
               Lily           VIS                      
  SPS          Lily           VIS                       Tom
SELECT   RelationshipCode, EmployeeName, 
         MAX(IF(BenefitCode = "DEN", BenefitOptionCode , NULL)) AS DEN,
         MAX(IF(BenefitCode = "MED", BenefitOptionCode , NULL)) AS MEDICAL,
         MAX(IF(BenefitCode = "VIS", BenefitOptionCode , NULL)) AS VISION
FROM `TableXXX` 
WHERE RelationshipCode = 'Employee'
GROUP BY EmployeeName, RelationshipCode
但失去家属与员工的关系似乎不是一个好主意。
有人能告诉我如何将垂直数据转换成水平数据吗?或者你有什么好主意来解决这个问题吗?

我可能会把它组织成CTE,使每个专栏或概念都成为自己的逻辑CTE

with people as (
  select distinct EmployeeName as person from <dataset>.<table> union distinct
  select distinct Name as person from <dataset>.table
),
med as (
  -- select people with MED columns
),
den as (
  -- select people with DEN columns
),
... (etc)
joined as (
  select * from people
  left join med using(person)
  left join den using(person)
)
select * from joined
对于这种情况,我的一般建议是从我从MED和DEN开始的时候开始。完成这些简单的项目后,您将进入更复杂或需要假设的项目。将它们分解成CTE块有助于封装每个想法


我们显然也不知道您的数据,或者这是否是一项现实任务,但您可能会有一些警告,需要更详细的逻辑——姓名相同的人、多代人的关系等等,我可能会将其组织到CTE中,使每个列或概念成为自己的逻辑CTE

with people as (
  select distinct EmployeeName as person from <dataset>.<table> union distinct
  select distinct Name as person from <dataset>.table
),
med as (
  -- select people with MED columns
),
den as (
  -- select people with DEN columns
),
... (etc)
joined as (
  select * from people
  left join med using(person)
  left join den using(person)
)
select * from joined
对于这种情况,我的一般建议是从我从MED和DEN开始的时候开始。完成这些简单的项目后,您将进入更复杂或需要假设的项目。将它们分解成CTE块有助于封装每个想法


显然,我们也不知道您的数据,或者这是否是一项现实任务,但您可能会有一些警告,需要更详细的逻辑,如同名人员、多代关系等

#standardSQL
SELECT 
  EmployeeName,
  IF(Relationship IS NULL, 'Self', Relationship) Relationship, 
  IFNULL(Name, EmployeeName) Name, 
  MAX(IF(BenefitCode = 'DEN', IFNULL(BenefitOption, BenefitCode), NULL)) AS DEN,
  MAX(IF(BenefitCode = 'MED', IFNULL(BenefitOption, BenefitCode), NULL)) AS MEDICAL,
  MAX(IF(BenefitCode = 'VIS', IFNULL(BenefitOption, BenefitCode), NULL)) AS VISION  
FROM `project.dataset.table`
GROUP BY Name, EmployeeName, Relationship 
-- ORDER BY Name, Relationship
如果要应用于您问题中的样本数据-结果为

Row EmployeeName    Relationship    Name    DEN     MEDICAL VISION   
1   Alice           Self            Alice   EEC     null    null     
2   Alice           CHL             John    EEC     null    null     
3   Alice           SPS             Lee     null    MED     null     
4   Lily            Self            Lily    null    null    VIS  
5   Lily            SPS             Tom     null    null    VIS    
另一个选择是将扁平化版本扩展为层次化版本

#standardSQL
SELECT EmployeeName,
  ARRAY_AGG(STRUCT(Name, Relationship, DEN, MEDICAL, VISION)) benefits
FROM (
  SELECT 
    EmployeeName,
    IF(Relationship IS NULL, 'Self', Relationship) Relationship, 
    IFNULL(Name, EmployeeName) Name, 
    MAX(IF(BenefitCode = 'DEN', IFNULL(BenefitOption, BenefitCode), NULL)) AS DEN,
    MAX(IF(BenefitCode = 'MED', IFNULL(BenefitOption, BenefitCode), NULL)) AS MEDICAL,
    MAX(IF(BenefitCode = 'VIS', IFNULL(BenefitOption, BenefitCode), NULL)) AS VISION  
  FROM `project.dataset.table`
  GROUP BY Name, EmployeeName, Relationship 
) 
GROUP BY EmployeeName
-- ORDER BY EmployeeName
在这种情况下,结果将是

Row EmployeeName    benefits.Name   benefits.Relationship   benefits.DEN    benefits.MEDICAL    benefits.VISION  
1   Alice           Alice           Self                    EEC             null                null     
                    John            CHL                     EEC             null                null     
                    Lee             SPS                     null            MED                 null       
2   Lily            Lily            Self                    null            null                VIS  
                    Tom             SPS                     null            null                VIS  

下面是BigQuery标准SQL

#standardSQL
SELECT 
  EmployeeName,
  IF(Relationship IS NULL, 'Self', Relationship) Relationship, 
  IFNULL(Name, EmployeeName) Name, 
  MAX(IF(BenefitCode = 'DEN', IFNULL(BenefitOption, BenefitCode), NULL)) AS DEN,
  MAX(IF(BenefitCode = 'MED', IFNULL(BenefitOption, BenefitCode), NULL)) AS MEDICAL,
  MAX(IF(BenefitCode = 'VIS', IFNULL(BenefitOption, BenefitCode), NULL)) AS VISION  
FROM `project.dataset.table`
GROUP BY Name, EmployeeName, Relationship 
-- ORDER BY Name, Relationship
如果要应用于您问题中的样本数据-结果为

Row EmployeeName    Relationship    Name    DEN     MEDICAL VISION   
1   Alice           Self            Alice   EEC     null    null     
2   Alice           CHL             John    EEC     null    null     
3   Alice           SPS             Lee     null    MED     null     
4   Lily            Self            Lily    null    null    VIS  
5   Lily            SPS             Tom     null    null    VIS    
另一个选择是将扁平化版本扩展为层次化版本

#standardSQL
SELECT EmployeeName,
  ARRAY_AGG(STRUCT(Name, Relationship, DEN, MEDICAL, VISION)) benefits
FROM (
  SELECT 
    EmployeeName,
    IF(Relationship IS NULL, 'Self', Relationship) Relationship, 
    IFNULL(Name, EmployeeName) Name, 
    MAX(IF(BenefitCode = 'DEN', IFNULL(BenefitOption, BenefitCode), NULL)) AS DEN,
    MAX(IF(BenefitCode = 'MED', IFNULL(BenefitOption, BenefitCode), NULL)) AS MEDICAL,
    MAX(IF(BenefitCode = 'VIS', IFNULL(BenefitOption, BenefitCode), NULL)) AS VISION  
  FROM `project.dataset.table`
  GROUP BY Name, EmployeeName, Relationship 
) 
GROUP BY EmployeeName
-- ORDER BY EmployeeName
在这种情况下,结果将是

Row EmployeeName    benefits.Name   benefits.Relationship   benefits.DEN    benefits.MEDICAL    benefits.VISION  
1   Alice           Alice           Self                    EEC             null                null     
                    John            CHL                     EEC             null                null     
                    Lee             SPS                     null            MED                 null       
2   Lily            Lily            Self                    null            null                VIS  
                    Tom             SPS                     null            null                VIS  

你转换的信息丢失了John是Alice的孩子的信息。这似乎不是个好主意,听起来很合理。你有什么好办法解决它吗。你必须弄清楚你想要什么样的数据结构。你转换的信息丢失了John是Alice的孩子的信息。这似乎不是个好主意,听起来很合理。你有什么好办法解决它吗。你必须弄清楚你想要什么样的数据结构。