Sql 大查询-将特定字段转换为列
我们在大查询中有一个表,如下所示 输入表:Sql 大查询-将特定字段转换为列,sql,google-cloud-platform,google-bigquery,Sql,Google Cloud Platform,Google Bigquery,我们在大查询中有一个表,如下所示 输入表: Name | Question | Answer -----+-----------+------- Bob | Interest | a Sue | Interest | a Sue | Interest | b Joe | Interest | b Joe | Gender | Male Bob | Gender | Female Sue | DOB | 2020-10-1
Name | Question | Answer
-----+-----------+-------
Bob | Interest | a
Sue | Interest | a
Sue | Interest | b
Joe | Interest | b
Joe | Gender | Male
Bob | Gender | Female
Sue | DOB | 2020-10-17
我们希望将上表转换为下表格式,使其对BI/可视化友好
目标/所需表格:
+----------------------------------------+
| Name | a | b | c | Gender | DOB |
+----------------------------------------+
| Bob | 1 | 0 | 0 | Female | 2020-10-17 |
| Sue | 1 | 1 | 0 | - | - |
| Joe | 0 | 1 | 0 | Male | - |
+----------------------------------------+
使用条件聚合:
select name,
countif(question = 'Interest' and answer = 'a') as a,
countif(question = 'Interest' and answer = 'b') as b,
countif(question = 'Interest' and answer = 'c') as c,
max(case when question = 'gender' then answer end) as gender,
max(case when question = 'DOB' then answer end) as dob
from t
group by name;
注意:如果缺少值,则返回
NULL
。对我来说,这比'-'
更有意义,尽管可以调整逻辑以返回连字符。使用条件聚合:
select name,
countif(question = 'Interest' and answer = 'a') as a,
countif(question = 'Interest' and answer = 'b') as b,
countif(question = 'Interest' and answer = 'c') as c,
max(case when question = 'gender' then answer end) as gender,
max(case when question = 'DOB' then answer end) as dob
from t
group by name;
注意:如果缺少值,则返回
NULL
。对我来说,这比'-'
更有意义,尽管可以调整逻辑以返回连字符。下面的用于BigQuery标准SQL,不依赖于知道特定的问题,也不依赖于对任何问题和答案值足够通用
EXECUTE IMMEDIATE (
SELECT """
SELECT name, """ || STRING_AGG("""MAX(IF(answer = '""" || value || """', 1, 0)) AS """ || value, ', ')
FROM (
SELECT DISTINCT answer value FROM `project.dataset.table`
WHERE question = 'Interest' ORDER BY value
)) || (
SELECT ", " || STRING_AGG("""MAX(IF(question = '""" || value || """', answer, '-')) AS """ || value, ', ')
FROM (
SELECT DISTINCT question value FROM `project.dataset.table`
WHERE question != 'Interest' ORDER BY value
)) || """
FROM `project.dataset.table`
GROUP BY name
""";
下面是针对BigQuery标准SQL的和不依赖于了解特定问题和足够通用的问题和答案值
EXECUTE IMMEDIATE (
SELECT """
SELECT name, """ || STRING_AGG("""MAX(IF(answer = '""" || value || """', 1, 0)) AS """ || value, ', ')
FROM (
SELECT DISTINCT answer value FROM `project.dataset.table`
WHERE question = 'Interest' ORDER BY value
)) || (
SELECT ", " || STRING_AGG("""MAX(IF(question = '""" || value || """', answer, '-')) AS """ || value, ', ')
FROM (
SELECT DISTINCT question value FROM `project.dataset.table`
WHERE question != 'Interest' ORDER BY value
)) || """
FROM `project.dataset.table`
GROUP BY name
""";
使用coalesce()
将提供破折号,而不存在动态sql和多选择的复杂性(和弱点)。请注意,这只是Gordon Linoff答案的一个变体-我只是添加了破折号而不是NULL的逻辑
select name,
countif(question = 'Interest' and answer = 'a') as a,
countif(question = 'Interest' and answer = 'b') as b,
countif(question = 'Interest' and answer = 'c') as c,
coalesce(max(case when question = 'gender' then answer end),'-') as gender,
coalesce(max(case when question = 'DOB' then answer end),'-') as dob
from t
group by name;
使用coalesce()
将提供破折号,而不存在动态sql和多选择的复杂性(和弱点)。请注意,这只是Gordon Linoff答案的一个变体-我只是添加了破折号而不是NULL的逻辑
select name,
countif(question = 'Interest' and answer = 'a') as a,
countif(question = 'Interest' and answer = 'b') as b,
countif(question = 'Interest' and answer = 'c') as c,
coalesce(max(case when question = 'gender' then answer end),'-') as gender,
coalesce(max(case when question = 'DOB' then answer end),'-') as dob
from t
group by name;
很抱歉,我看不出需求是多么复杂,以至于您需要动态sql或多个
select distinct
statements@Used_By_Already-我完全不同意你!想得远一点,看得远一点,然后就是一个特别的例子!!!当你学会这样做的时候——来评判吧!或者最好只提供你的答案。但是,对正确答案的向下投票——仅仅因为你看不到什么——是相当低的!!我建议你重新阅读我的答案,尤其是第一句话,这样也许你会得到答案,改变主意,收回你的观点,尽管这对于OP正在展示的简单案例来说可能有些过分,这在技术上是正确的。让我们看看OP会说什么-从经验来看-通常,样本数据只包含几个要透视的值-实际情况有10秒或100秒-在这种情况下,我的答案中提供的解决方案是唯一的出路!所以我真的不明白为什么你会拒绝使用正确的方向,为什么你会把它定性为一种过度使用!本例中的另一个诀窍是,它是两个不同的支点-这使得它与之前所有/任何与支点相关的问题完全不同-至少对于google bigquery来说是如此tag@mikhail由于查询中有多个数据透视,在这种情况下,如何将最终结果存储到表中?很抱歉,我看不出需求是多么复杂,以至于您需要动态sql或多个select distinct
statements@Used_By_Already-我完全不同意你!想得远一点,看得远一点,然后就是一个特别的例子!!!当你学会这样做的时候——来评判吧!或者最好只提供你的答案。但是,对正确答案的向下投票——仅仅因为你看不到什么——是相当低的!!我建议你重新阅读我的答案,尤其是第一句话,这样也许你会得到答案,改变主意,收回你的观点,尽管这对于OP正在展示的简单案例来说可能有些过分,这在技术上是正确的。让我们看看OP会说什么-从经验来看-通常,样本数据只包含几个要透视的值-实际情况有10秒或100秒-在这种情况下,我的答案中提供的解决方案是唯一的出路!所以我真的不明白为什么你会拒绝使用正确的方向,为什么你会把它定性为一种过度使用!本例中的另一个诀窍是,它是两个不同的支点-这使得它与之前所有/任何与支点相关的问题完全不同-至少对于google bigquery来说是如此tag@mikhail由于查询中有多个数据透视,在这种情况下,如何将最终结果存储到表中?