Sql 如何使用BigQuery将垂直数据转换为水平数据?
我目前正在处理员工福利数据。然而,电子表格数据完全是一团糟。我想将其格式化为易于捕获信息的格式。 当前格式如下所示:Sql 如何使用BigQuery将垂直数据转换为水平数据?,sql,google-bigquery,Sql,Google Bigquery,我目前正在处理员工福利数据。然而,电子表格数据完全是一团糟。我想将其格式化为易于捕获信息的格式。 当前格式如下所示: Relationship EmployeeName BenefitCode BenefitOption Name Alice DEN EEC CHL Alice DEN EEC John SPS
Relationship EmployeeName BenefitCode BenefitOption Name
Alice DEN EEC
CHL Alice DEN EEC John
SPS Alice MED Lee
Lily VIS
SPS Lily VIS Tom
SELECT RelationshipCode, EmployeeName,
MAX(IF(BenefitCode = "DEN", BenefitOptionCode , NULL)) AS DEN,
MAX(IF(BenefitCode = "MED", BenefitOptionCode , NULL)) AS MEDICAL,
MAX(IF(BenefitCode = "VIS", BenefitOptionCode , NULL)) AS VISION
FROM `TableXXX`
WHERE RelationshipCode = 'Employee'
GROUP BY EmployeeName, RelationshipCode
我想这样转移它:
Relationship Name MED DEN VIS
Employee Alice EEC
CHL John EEC
SPS Lee MED
Employee Lily VIS
SPS Tom VIS
我试图按名称和收益代码对数据进行分组,但我对此感到非常困惑
我的代码如下:
Relationship EmployeeName BenefitCode BenefitOption Name
Alice DEN EEC
CHL Alice DEN EEC John
SPS Alice MED Lee
Lily VIS
SPS Lily VIS Tom
SELECT RelationshipCode, EmployeeName,
MAX(IF(BenefitCode = "DEN", BenefitOptionCode , NULL)) AS DEN,
MAX(IF(BenefitCode = "MED", BenefitOptionCode , NULL)) AS MEDICAL,
MAX(IF(BenefitCode = "VIS", BenefitOptionCode , NULL)) AS VISION
FROM `TableXXX`
WHERE RelationshipCode = 'Employee'
GROUP BY EmployeeName, RelationshipCode
但失去家属与员工的关系似乎不是一个好主意。
有人能告诉我如何将垂直数据转换成水平数据吗?或者你有什么好主意来解决这个问题吗?我可能会把它组织成CTE,使每个专栏或概念都成为自己的逻辑CTE
with people as (
select distinct EmployeeName as person from <dataset>.<table> union distinct
select distinct Name as person from <dataset>.table
),
med as (
-- select people with MED columns
),
den as (
-- select people with DEN columns
),
... (etc)
joined as (
select * from people
left join med using(person)
left join den using(person)
)
select * from joined
对于这种情况,我的一般建议是从我从MED和DEN开始的时候开始。完成这些简单的项目后,您将进入更复杂或需要假设的项目。将它们分解成CTE块有助于封装每个想法
我们显然也不知道您的数据,或者这是否是一项现实任务,但您可能会有一些警告,需要更详细的逻辑——姓名相同的人、多代人的关系等等,我可能会将其组织到CTE中,使每个列或概念成为自己的逻辑CTE
with people as (
select distinct EmployeeName as person from <dataset>.<table> union distinct
select distinct Name as person from <dataset>.table
),
med as (
-- select people with MED columns
),
den as (
-- select people with DEN columns
),
... (etc)
joined as (
select * from people
left join med using(person)
left join den using(person)
)
select * from joined
对于这种情况,我的一般建议是从我从MED和DEN开始的时候开始。完成这些简单的项目后,您将进入更复杂或需要假设的项目。将它们分解成CTE块有助于封装每个想法
显然,我们也不知道您的数据,或者这是否是一项现实任务,但您可能会有一些警告,需要更详细的逻辑,如同名人员、多代关系等
#standardSQL
SELECT
EmployeeName,
IF(Relationship IS NULL, 'Self', Relationship) Relationship,
IFNULL(Name, EmployeeName) Name,
MAX(IF(BenefitCode = 'DEN', IFNULL(BenefitOption, BenefitCode), NULL)) AS DEN,
MAX(IF(BenefitCode = 'MED', IFNULL(BenefitOption, BenefitCode), NULL)) AS MEDICAL,
MAX(IF(BenefitCode = 'VIS', IFNULL(BenefitOption, BenefitCode), NULL)) AS VISION
FROM `project.dataset.table`
GROUP BY Name, EmployeeName, Relationship
-- ORDER BY Name, Relationship
如果要应用于您问题中的样本数据-结果为
Row EmployeeName Relationship Name DEN MEDICAL VISION
1 Alice Self Alice EEC null null
2 Alice CHL John EEC null null
3 Alice SPS Lee null MED null
4 Lily Self Lily null null VIS
5 Lily SPS Tom null null VIS
另一个选择是将扁平化版本扩展为层次化版本
#standardSQL
SELECT EmployeeName,
ARRAY_AGG(STRUCT(Name, Relationship, DEN, MEDICAL, VISION)) benefits
FROM (
SELECT
EmployeeName,
IF(Relationship IS NULL, 'Self', Relationship) Relationship,
IFNULL(Name, EmployeeName) Name,
MAX(IF(BenefitCode = 'DEN', IFNULL(BenefitOption, BenefitCode), NULL)) AS DEN,
MAX(IF(BenefitCode = 'MED', IFNULL(BenefitOption, BenefitCode), NULL)) AS MEDICAL,
MAX(IF(BenefitCode = 'VIS', IFNULL(BenefitOption, BenefitCode), NULL)) AS VISION
FROM `project.dataset.table`
GROUP BY Name, EmployeeName, Relationship
)
GROUP BY EmployeeName
-- ORDER BY EmployeeName
在这种情况下,结果将是
Row EmployeeName benefits.Name benefits.Relationship benefits.DEN benefits.MEDICAL benefits.VISION
1 Alice Alice Self EEC null null
John CHL EEC null null
Lee SPS null MED null
2 Lily Lily Self null null VIS
Tom SPS null null VIS
下面是BigQuery标准SQL
#standardSQL
SELECT
EmployeeName,
IF(Relationship IS NULL, 'Self', Relationship) Relationship,
IFNULL(Name, EmployeeName) Name,
MAX(IF(BenefitCode = 'DEN', IFNULL(BenefitOption, BenefitCode), NULL)) AS DEN,
MAX(IF(BenefitCode = 'MED', IFNULL(BenefitOption, BenefitCode), NULL)) AS MEDICAL,
MAX(IF(BenefitCode = 'VIS', IFNULL(BenefitOption, BenefitCode), NULL)) AS VISION
FROM `project.dataset.table`
GROUP BY Name, EmployeeName, Relationship
-- ORDER BY Name, Relationship
如果要应用于您问题中的样本数据-结果为
Row EmployeeName Relationship Name DEN MEDICAL VISION
1 Alice Self Alice EEC null null
2 Alice CHL John EEC null null
3 Alice SPS Lee null MED null
4 Lily Self Lily null null VIS
5 Lily SPS Tom null null VIS
另一个选择是将扁平化版本扩展为层次化版本
#standardSQL
SELECT EmployeeName,
ARRAY_AGG(STRUCT(Name, Relationship, DEN, MEDICAL, VISION)) benefits
FROM (
SELECT
EmployeeName,
IF(Relationship IS NULL, 'Self', Relationship) Relationship,
IFNULL(Name, EmployeeName) Name,
MAX(IF(BenefitCode = 'DEN', IFNULL(BenefitOption, BenefitCode), NULL)) AS DEN,
MAX(IF(BenefitCode = 'MED', IFNULL(BenefitOption, BenefitCode), NULL)) AS MEDICAL,
MAX(IF(BenefitCode = 'VIS', IFNULL(BenefitOption, BenefitCode), NULL)) AS VISION
FROM `project.dataset.table`
GROUP BY Name, EmployeeName, Relationship
)
GROUP BY EmployeeName
-- ORDER BY EmployeeName
在这种情况下,结果将是
Row EmployeeName benefits.Name benefits.Relationship benefits.DEN benefits.MEDICAL benefits.VISION
1 Alice Alice Self EEC null null
John CHL EEC null null
Lee SPS null MED null
2 Lily Lily Self null null VIS
Tom SPS null null VIS
你转换的信息丢失了John是Alice的孩子的信息。这似乎不是个好主意,听起来很合理。你有什么好办法解决它吗。你必须弄清楚你想要什么样的数据结构。你转换的信息丢失了John是Alice的孩子的信息。这似乎不是个好主意,听起来很合理。你有什么好办法解决它吗。你必须弄清楚你想要什么样的数据结构。