Sql 如何将用户分组为A、B或两者_Sql_Google Bigquery

Sql 如何将用户分组为A、B或两者

sql google-bigquery

Sql 如何将用户分组为A、B或两者,sql,google-bigquery,Sql,Google Bigquery,如果我有这样的数据： user + tag -----|----- bob | A bob | A bob | B tom | A tom | A amy | B amy | B jen | A jen | A 对于数以百万计的用户，我想知道有多少用户有标签A、B和两者。这是我一直坚持的两种情况在这种情况下，答案是： Both: 1 A only: 2 B only: 1 我不需要返回用户ID，只需要返回计数。我正在使用BigQuery。这里有一个解决方

如果我有这样的数据：

user + tag
-----|-----
bob  |  A
bob  |  A
bob  |  B
tom  |  A
tom  |  A
amy  |  B
amy  |  B
jen  |  A
jen  |  A

对于数以百万计的用户，我想知道有多少用户有标签A、B和两者。这是我一直坚持的两种情况

在这种情况下，答案是：

Both: 1
A only: 2
B only: 1

我不需要返回用户ID，只需要返回计数。我正在使用BigQuery。

这里有一个解决方案，使用

SOME

和

each

函数：

SELECT
  SUM(category == 'both') AS both_count,
  SUM(category == 'A') AS a_count,
  SUM(category == 'B') AS b_count
FROM (
  SELECT
    name,
    CASE WHEN SOME(tag == 'A') AND SOME(tag == 'B') THEN 'both' 
         WHEN EVERY(tag == 'A') THEN 'A' 
         WHEN EVERY(tag == 'B') THEN 'B'
         ELSE 'none' END AS category
  FROM 
    (SELECT 'bob' as name, 'A' as tag),
    (SELECT 'bob' as name, 'A' as tag),
    (SELECT 'bob' as name, 'B' as tag),
    (SELECT 'tom' as name, 'A' as tag),
    (SELECT 'tom' as name, 'A' as tag),
    (SELECT 'amy' as name, 'B' as tag),
    (SELECT 'amy' as name, 'B' as tag),
    (SELECT 'jen' as name, 'A' as tag),
    (SELECT 'jen' as name, 'A' as tag)
  GROUP BY name)

这里有一个解决方案，使用

SOME

和

each

功能：

SELECT
  SUM(category == 'both') AS both_count,
  SUM(category == 'A') AS a_count,
  SUM(category == 'B') AS b_count
FROM (
  SELECT
    name,
    CASE WHEN SOME(tag == 'A') AND SOME(tag == 'B') THEN 'both' 
         WHEN EVERY(tag == 'A') THEN 'A' 
         WHEN EVERY(tag == 'B') THEN 'B'
         ELSE 'none' END AS category
  FROM 
    (SELECT 'bob' as name, 'A' as tag),
    (SELECT 'bob' as name, 'A' as tag),
    (SELECT 'bob' as name, 'B' as tag),
    (SELECT 'tom' as name, 'A' as tag),
    (SELECT 'tom' as name, 'A' as tag),
    (SELECT 'amy' as name, 'B' as tag),
    (SELECT 'amy' as name, 'B' as tag),
    (SELECT 'jen' as name, 'A' as tag),
    (SELECT 'jen' as name, 'A' as tag)
  GROUP BY name)

这里有一个解决方案，使用

SOME

和

each

功能：

SELECT
  SUM(category == 'both') AS both_count,
  SUM(category == 'A') AS a_count,
  SUM(category == 'B') AS b_count
FROM (
  SELECT
    name,
    CASE WHEN SOME(tag == 'A') AND SOME(tag == 'B') THEN 'both' 
         WHEN EVERY(tag == 'A') THEN 'A' 
         WHEN EVERY(tag == 'B') THEN 'B'
         ELSE 'none' END AS category
  FROM 
    (SELECT 'bob' as name, 'A' as tag),
    (SELECT 'bob' as name, 'A' as tag),
    (SELECT 'bob' as name, 'B' as tag),
    (SELECT 'tom' as name, 'A' as tag),
    (SELECT 'tom' as name, 'A' as tag),
    (SELECT 'amy' as name, 'B' as tag),
    (SELECT 'amy' as name, 'B' as tag),
    (SELECT 'jen' as name, 'A' as tag),
    (SELECT 'jen' as name, 'A' as tag)
  GROUP BY name)

这里有一个解决方案，使用

SOME

和

each

功能：

SELECT
  SUM(category == 'both') AS both_count,
  SUM(category == 'A') AS a_count,
  SUM(category == 'B') AS b_count
FROM (
  SELECT
    name,
    CASE WHEN SOME(tag == 'A') AND SOME(tag == 'B') THEN 'both' 
         WHEN EVERY(tag == 'A') THEN 'A' 
         WHEN EVERY(tag == 'B') THEN 'B'
         ELSE 'none' END AS category
  FROM 
    (SELECT 'bob' as name, 'A' as tag),
    (SELECT 'bob' as name, 'A' as tag),
    (SELECT 'bob' as name, 'B' as tag),
    (SELECT 'tom' as name, 'A' as tag),
    (SELECT 'tom' as name, 'A' as tag),
    (SELECT 'amy' as name, 'B' as tag),
    (SELECT 'amy' as name, 'B' as tag),
    (SELECT 'jen' as name, 'A' as tag),
    (SELECT 'jen' as name, 'A' as tag)
  GROUP BY name)

我不知道GoogleBigQuery的语法，但这里有一个基于sql的解决方案

    select a.tag_desc, count(distinct a.user) as total
    from (
    select coalesce(tA.user,tB.user) as user
      , tA.tag
      , tB.tag
      , case 
          when tA.tag is not null and tB.tag is not null then 'Both'
          when tA.tag is not null and tB.tag is null then 'A Only'
          when tA.tag is null and tB.tag is not null then 'B Only'
        end as tag_desc
    from table tA
      full outer join table tB
        on tA.user = tB.user
        and tB.tag = B
    where tA.tag = 'A'
    ) a

有一个子查询使用完全外部联接将数据集连接回自身。这将允许您同时评估两个条件（A和B）。有一个case语句来定义这三个结果。在外部查询中，我计算每个case语句结果的用户数。

我不知道googlebigquery的语法，但这里有一个基于sql的问题解决方案

    select a.tag_desc, count(distinct a.user) as total
    from (
    select coalesce(tA.user,tB.user) as user
      , tA.tag
      , tB.tag
      , case 
          when tA.tag is not null and tB.tag is not null then 'Both'
          when tA.tag is not null and tB.tag is null then 'A Only'
          when tA.tag is null and tB.tag is not null then 'B Only'
        end as tag_desc
    from table tA
      full outer join table tB
        on tA.user = tB.user
        and tB.tag = B
    where tA.tag = 'A'
    ) a

我不知道googlebigquery的语法，但这里有一个基于sql的问题解决方案

    select a.tag_desc, count(distinct a.user) as total
    from (
    select coalesce(tA.user,tB.user) as user
      , tA.tag
      , tB.tag
      , case 
          when tA.tag is not null and tB.tag is not null then 'Both'
          when tA.tag is not null and tB.tag is null then 'A Only'
          when tA.tag is null and tB.tag is not null then 'B Only'
        end as tag_desc
    from table tA
      full outer join table tB
        on tA.user = tB.user
        and tB.tag = B
    where tA.tag = 'A'
    ) a

我不知道googlebigquery的语法，但这里有一个基于sql的问题解决方案

    select a.tag_desc, count(distinct a.user) as total
    from (
    select coalesce(tA.user,tB.user) as user
      , tA.tag
      , tB.tag
      , case 
          when tA.tag is not null and tB.tag is not null then 'Both'
          when tA.tag is not null and tB.tag is null then 'A Only'
          when tA.tag is null and tB.tag is not null then 'B Only'
        end as tag_desc
    from table tA
      full outer join table tB
        on tA.user = tB.user
        and tB.tag = B
    where tA.tag = 'A'
    ) a

我仔细检查了bigquery文档，它似乎支持完全外部联接。重新阅读这个问题，我发现样本数据包含重复的记录。我的查询没有考虑到这一点。我仔细检查了bigquery文档，它似乎支持完全外部联接。重新阅读这个问题，我发现样本数据包含重复的记录。我的查询没有考虑到这一点。我仔细检查了bigquery文档，它似乎支持完全外部联接。重新阅读这个问题，我发现样本数据包含重复的记录。我的查询没有考虑到这一点。我仔细检查了bigquery文档，它似乎支持完全外部联接。重新阅读这个问题，我发现样本数据包含重复的记录。我的查询没有考虑到这一点。@zbinsd此查询适用于数百万用户，您有什么问题？您只需使用查询替换示例中中的静态

。应适用于数百万用户，但如果有数百万用户，您可能需要将GROUP BY替换为GROUP BY。现在，如果每个用户也有数百万条记录，您可能会受益于首先按名称分组，标记以删除重复记录。@zbinsd此查询适用于数百万用户，您的问题是什么？您只需使用查询替换示例中

中的静态

。应适用于数百万用户，但如果有数百万用户，您可能需要将GROUP BY替换为GROUP BY。现在，如果每个用户也有数百万条记录，您可能会受益于首先按名称分组，标记以删除重复记录。@zbinsd此查询适用于数百万用户，您的问题是什么？您只需使用查询替换示例中

中的静态

。应适用于数百万用户，但如果有数百万用户，您可能需要将GROUP BY替换为GROUP BY。现在，如果每个用户也有数百万条记录，您可能会受益于首先按名称分组，标记以删除重复记录。@zbinsd此查询适用于数百万用户，您的问题是什么？您只需使用查询替换示例中

中的静态

。应适用于数百万用户，但如果有数百万用户，您可能需要将GROUP BY替换为GROUP BY。现在，如果您的每个用户也有数百万条记录，您可能会受益于首先按名称分组，标记以删除重复记录。