Sql 如何将用户分组为A、B或两者
如果我有这样的数据:Sql 如何将用户分组为A、B或两者,sql,google-bigquery,Sql,Google Bigquery,如果我有这样的数据: user + tag -----|----- bob | A bob | A bob | B tom | A tom | A amy | B amy | B jen | A jen | A 对于数以百万计的用户,我想知道有多少用户有标签A、B和两者。这是我一直坚持的两种情况 在这种情况下,答案是: Both: 1 A only: 2 B only: 1 我不需要返回用户ID,只需要返回计数。我正在使用BigQuery。这里有一个解决方
user + tag
-----|-----
bob | A
bob | A
bob | B
tom | A
tom | A
amy | B
amy | B
jen | A
jen | A
对于数以百万计的用户,我想知道有多少用户有标签A、B和两者。这是我一直坚持的两种情况
在这种情况下,答案是:
Both: 1
A only: 2
B only: 1
我不需要返回用户ID,只需要返回计数。我正在使用BigQuery。这里有一个解决方案,使用
SOME
和each
函数:
SELECT
SUM(category == 'both') AS both_count,
SUM(category == 'A') AS a_count,
SUM(category == 'B') AS b_count
FROM (
SELECT
name,
CASE WHEN SOME(tag == 'A') AND SOME(tag == 'B') THEN 'both'
WHEN EVERY(tag == 'A') THEN 'A'
WHEN EVERY(tag == 'B') THEN 'B'
ELSE 'none' END AS category
FROM
(SELECT 'bob' as name, 'A' as tag),
(SELECT 'bob' as name, 'A' as tag),
(SELECT 'bob' as name, 'B' as tag),
(SELECT 'tom' as name, 'A' as tag),
(SELECT 'tom' as name, 'A' as tag),
(SELECT 'amy' as name, 'B' as tag),
(SELECT 'amy' as name, 'B' as tag),
(SELECT 'jen' as name, 'A' as tag),
(SELECT 'jen' as name, 'A' as tag)
GROUP BY name)
这里有一个解决方案,使用
SOME
和each
功能:
SELECT
SUM(category == 'both') AS both_count,
SUM(category == 'A') AS a_count,
SUM(category == 'B') AS b_count
FROM (
SELECT
name,
CASE WHEN SOME(tag == 'A') AND SOME(tag == 'B') THEN 'both'
WHEN EVERY(tag == 'A') THEN 'A'
WHEN EVERY(tag == 'B') THEN 'B'
ELSE 'none' END AS category
FROM
(SELECT 'bob' as name, 'A' as tag),
(SELECT 'bob' as name, 'A' as tag),
(SELECT 'bob' as name, 'B' as tag),
(SELECT 'tom' as name, 'A' as tag),
(SELECT 'tom' as name, 'A' as tag),
(SELECT 'amy' as name, 'B' as tag),
(SELECT 'amy' as name, 'B' as tag),
(SELECT 'jen' as name, 'A' as tag),
(SELECT 'jen' as name, 'A' as tag)
GROUP BY name)
这里有一个解决方案,使用
SOME
和each
功能:
SELECT
SUM(category == 'both') AS both_count,
SUM(category == 'A') AS a_count,
SUM(category == 'B') AS b_count
FROM (
SELECT
name,
CASE WHEN SOME(tag == 'A') AND SOME(tag == 'B') THEN 'both'
WHEN EVERY(tag == 'A') THEN 'A'
WHEN EVERY(tag == 'B') THEN 'B'
ELSE 'none' END AS category
FROM
(SELECT 'bob' as name, 'A' as tag),
(SELECT 'bob' as name, 'A' as tag),
(SELECT 'bob' as name, 'B' as tag),
(SELECT 'tom' as name, 'A' as tag),
(SELECT 'tom' as name, 'A' as tag),
(SELECT 'amy' as name, 'B' as tag),
(SELECT 'amy' as name, 'B' as tag),
(SELECT 'jen' as name, 'A' as tag),
(SELECT 'jen' as name, 'A' as tag)
GROUP BY name)
这里有一个解决方案,使用
SOME
和each
功能:
SELECT
SUM(category == 'both') AS both_count,
SUM(category == 'A') AS a_count,
SUM(category == 'B') AS b_count
FROM (
SELECT
name,
CASE WHEN SOME(tag == 'A') AND SOME(tag == 'B') THEN 'both'
WHEN EVERY(tag == 'A') THEN 'A'
WHEN EVERY(tag == 'B') THEN 'B'
ELSE 'none' END AS category
FROM
(SELECT 'bob' as name, 'A' as tag),
(SELECT 'bob' as name, 'A' as tag),
(SELECT 'bob' as name, 'B' as tag),
(SELECT 'tom' as name, 'A' as tag),
(SELECT 'tom' as name, 'A' as tag),
(SELECT 'amy' as name, 'B' as tag),
(SELECT 'amy' as name, 'B' as tag),
(SELECT 'jen' as name, 'A' as tag),
(SELECT 'jen' as name, 'A' as tag)
GROUP BY name)
我不知道GoogleBigQuery的语法,但这里有一个基于sql的解决方案
select a.tag_desc, count(distinct a.user) as total
from (
select coalesce(tA.user,tB.user) as user
, tA.tag
, tB.tag
, case
when tA.tag is not null and tB.tag is not null then 'Both'
when tA.tag is not null and tB.tag is null then 'A Only'
when tA.tag is null and tB.tag is not null then 'B Only'
end as tag_desc
from table tA
full outer join table tB
on tA.user = tB.user
and tB.tag = B
where tA.tag = 'A'
) a
有一个子查询使用完全外部联接将数据集连接回自身。这将允许您同时评估两个条件(A和B)。有一个case语句来定义这三个结果。在外部查询中,我计算每个case语句结果的用户数。我不知道googlebigquery的语法,但这里有一个基于sql的问题解决方案
select a.tag_desc, count(distinct a.user) as total
from (
select coalesce(tA.user,tB.user) as user
, tA.tag
, tB.tag
, case
when tA.tag is not null and tB.tag is not null then 'Both'
when tA.tag is not null and tB.tag is null then 'A Only'
when tA.tag is null and tB.tag is not null then 'B Only'
end as tag_desc
from table tA
full outer join table tB
on tA.user = tB.user
and tB.tag = B
where tA.tag = 'A'
) a
有一个子查询使用完全外部联接将数据集连接回自身。这将允许您同时评估两个条件(A和B)。有一个case语句来定义这三个结果。在外部查询中,我计算每个case语句结果的用户数。我不知道googlebigquery的语法,但这里有一个基于sql的问题解决方案
select a.tag_desc, count(distinct a.user) as total
from (
select coalesce(tA.user,tB.user) as user
, tA.tag
, tB.tag
, case
when tA.tag is not null and tB.tag is not null then 'Both'
when tA.tag is not null and tB.tag is null then 'A Only'
when tA.tag is null and tB.tag is not null then 'B Only'
end as tag_desc
from table tA
full outer join table tB
on tA.user = tB.user
and tB.tag = B
where tA.tag = 'A'
) a
有一个子查询使用完全外部联接将数据集连接回自身。这将允许您同时评估两个条件(A和B)。有一个case语句来定义这三个结果。在外部查询中,我计算每个case语句结果的用户数。我不知道googlebigquery的语法,但这里有一个基于sql的问题解决方案
select a.tag_desc, count(distinct a.user) as total
from (
select coalesce(tA.user,tB.user) as user
, tA.tag
, tB.tag
, case
when tA.tag is not null and tB.tag is not null then 'Both'
when tA.tag is not null and tB.tag is null then 'A Only'
when tA.tag is null and tB.tag is not null then 'B Only'
end as tag_desc
from table tA
full outer join table tB
on tA.user = tB.user
and tB.tag = B
where tA.tag = 'A'
) a
有一个子查询使用完全外部联接将数据集连接回自身。这将允许您同时评估两个条件(A和B)。有一个case语句来定义这三个结果。在外部查询中,我计算每个case语句结果的用户数。我仔细检查了bigquery文档,它似乎支持完全外部联接。重新阅读这个问题,我发现样本数据包含重复的记录。我的查询没有考虑到这一点。我仔细检查了bigquery文档,它似乎支持完全外部联接。重新阅读这个问题,我发现样本数据包含重复的记录。我的查询没有考虑到这一点。我仔细检查了bigquery文档,它似乎支持完全外部联接。重新阅读这个问题,我发现样本数据包含重复的记录。我的查询没有考虑到这一点。我仔细检查了bigquery文档,它似乎支持完全外部联接。重新阅读这个问题,我发现样本数据包含重复的记录。我的查询没有考虑到这一点。@zbinsd此查询适用于数百万用户,您有什么问题?您只需使用查询替换示例中中的静态
。应适用于数百万用户,但如果有数百万用户,您可能需要将GROUP BY替换为GROUP BY。现在,如果每个用户也有数百万条记录,您可能会受益于首先按名称分组,标记以删除重复记录。@zbinsd此查询适用于数百万用户,您的问题是什么?您只需使用查询替换示例中
中的静态。应适用于数百万用户,但如果有数百万用户,您可能需要将GROUP BY替换为GROUP BY。现在,如果每个用户也有数百万条记录,您可能会受益于首先按名称分组,标记以删除重复记录。@zbinsd此查询适用于数百万用户,您的问题是什么?您只需使用查询替换示例中
中的静态。应适用于数百万用户,但如果有数百万用户,您可能需要将GROUP BY替换为GROUP BY。现在,如果每个用户也有数百万条记录,您可能会受益于首先按名称分组,标记以删除重复记录。@zbinsd此查询适用于数百万用户,您的问题是什么?您只需使用查询替换示例中
中的静态。应适用于数百万用户,但如果有数百万用户,您可能需要将GROUP BY替换为GROUP BY。现在,如果您的每个用户也有数百万条记录,您可能会受益于首先按名称分组,标记以删除重复记录。