Google bigquery 将条目等同于其自身的聚合版本_Google Bigquery

Google bigquery 将条目等同于其自身的聚合版本

google-bigquery

Google bigquery 将条目等同于其自身的聚合版本,google-bigquery,Google Bigquery,我试图找出条目的值是否为分组值的最大值。它的目的是坐在一个更大的if逻辑中我希望它看起来像这样： SELECT t.id as t_id, sum(if(t.value = max(t.value), 1, 0)) AS is_max_value FROM dataset.table AS t GROUP BY t_id 答复是： Error: Expression 't.value' is not present in the GROUP BY list 我的代码应该

我试图找出条目的值是否为分组值的最大值。它的目的是坐在一个更大的

if

逻辑中

我希望它看起来像这样：

SELECT
    t.id as t_id, 
    sum(if(t.value = max(t.value), 1, 0)) AS is_max_value

FROM dataset.table AS t
GROUP BY t_id

答复是：

Error: Expression 't.value' is not present in the GROUP BY list

我的代码应该如何做到这一点？

首先需要在子查询中编译max值，然后再次将该值连接到表中

使用此处可用的公共数据集是一个示例：

SELECT
  t.word,
  t.word_count,
  t.corpus_date
FROM
  [publicdata:samples.shakespeare] t
JOIN (
  SELECT
    corpus_date,
    MAX(word_count) word_count,
  FROM
    [publicdata:samples.shakespeare]
  GROUP BY
    1 ) d
ON
  d.corpus_date=t.corpus_date
  AND t.word_count=d.word_count
LIMIT
  25

结果:

+-----+--------+--------------+---------------+---+
| Row | t_word | t_word_count | t_corpus_date |   |
+-----+--------+--------------+---------------+---+
|   1 | the    |          762 |          1597 |   |
|   2 | the    |          894 |          1598 |   |
|   3 | the    |          841 |          1590 |   |
|   4 | the    |          680 |          1606 |   |
|   5 | the    |          942 |          1607 |   |
|   6 | the    |          779 |          1609 |   |
|   7 | the    |          995 |          1600 |   |
|   8 | the    |          937 |          1599 |   |
|   9 | the    |          738 |          1612 |   |
|  10 | the    |          612 |          1595 |   |
|  11 | the    |          848 |          1592 |   |
|  12 | the    |          753 |          1594 |   |
|  13 | the    |          740 |          1596 |   |
|  14 | I      |          828 |          1603 |   |
|  15 | the    |          525 |          1608 |   |
|  16 | the    |          363 |             0 |   |
|  17 | I      |          629 |          1593 |   |
|  18 | I      |          447 |          1611 |   |
|  19 | the    |          715 |          1602 |   |
|  20 | the    |          717 |          1610 |   |
+-----+--------+--------------+---------------+---+

您可以看到，在由

corpus\u date

定义的分区中保留具有最大

word计数的word，使用窗口函数将最大值“分散”到所有相关记录上。
这样可以避免连接
SELECT
  *
FROM (
  SELECT
    corpus,
    corpus_date,
    word,
    word_count,
    MAX(word_count) OVER (PARTITION BY corpus) AS Max_Word_Count
  FROM
    [publicdata:samples.shakespeare] )
WHERE
  word_count=Max_Word_Count

说明：
内部选择-为每一行/记录计算具有相同id的所有行中的最大值

外部选择-对于每一行/记录，将行的值与相应组的最大值进行比较，然后将true或false分别转换为1或0（根据相关预期）
select 
  id, 
  value, 
  integer(value = max_value) as is_max_value
from (
  select id, value, max(value) over(partition by id) as max_value
  from dataset.table
)