Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/sql/82.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
BigQuery SQL-如果两列在连续几天内,则将它们连接起来_Sql_Google Bigquery - Fatal编程技术网

BigQuery SQL-如果两列在连续几天内,则将它们连接起来

BigQuery SQL-如果两列在连续几天内,则将它们连接起来,sql,google-bigquery,Sql,Google Bigquery,我正在寻找一种方法来调整在BigQuery中运行的sql查询,以返回连续发生两天甚至三天的已发送事件类型的单计数总数 SELECT date(EventDate) as EventDate, EventType, count(*) as count FROM `Database.Table` where date(EventDate) > DATE_SUB (CURRENT_DATE, INTERVAL 100 DAY) Group by 1,2 ORDER by

我正在寻找一种方法来调整在BigQuery中运行的sql查询,以返回连续发生两天甚至三天的已发送事件类型的单计数总数

SELECT date(EventDate) as EventDate, EventType, count(*) as count FROM `Database.Table`
    where date(EventDate) > DATE_SUB (CURRENT_DATE, INTERVAL 100 DAY)
    Group by 1,2 
    ORDER by 1,2
以上查询的答复:

| Row    | EventDate | EventType | count |
| ------ | --------- |-----------|-------|
| 1      | 2019-02-06|  Sent     |    4  |
| 2      | 2019-02-07|  Sent     |    5  |
| 3      | 2019-02-12|  NotSent  |    7  |
| 4      | 2019-02-13|  Bounces  |    22 |
| 5      | 2019-02-14|  Bounces  |    22 |
| 6      | 2019-03-06|  Sent     |    2  |
| 7      | 2019-03-07|  Sent     |    4  |
| 8      | 2019-03-07|  NotSent  |    5  |
| 9      | 2019-03-12|  Bounces  |    7  |
| 10     | 2019-03-13|  Sent     |    22 |
| 11     | 2019-04-05|  Sent     |    2  |
我想得到的答复是:

| Row    | EventDate | EventType | count |
| ------ | --------- |-----------|-------|
| 1      | 2019-02-06|  Sent     |    9  |
| 2      | 2019-02-12|  NotSent  |    7  |
| 3      | 2019-02-13|  Bounces  |    22 |
| 4      | 2019-02-14|  Bounces  |    22 |
| 5      | 2019-03-06|  Sent     |    6  |
| 6      | 2019-03-07|  NotSent  |    5  |
| 7      | 2019-03-12|  Bounces  |    7  |
| 8      | 2019-03-13|  Sent     |    22 |
| 9      | 2019-04-05|  Sent     |    2  |

这样我就可以连续几天将两个计数与EventType“Sent”连接起来,并显示其他EventType而不将它们连接起来,例如Bounces和NotSent。

这是一个间隙和孤岛问题。最简单的方法是使用行数和减法来识别岛屿。然后汇总:

select min(row), eventType, min(eventDate), sum(count)
from (select t.*,
             row_number() over (partition by eventType order by eventDate) as seqnum
      from t
     ) t
group by eventType, dateadd(eventDate, interval -seqnum day)

我编写了一个查询,合并了表中所有连续的2天。 它提供与您想要的完全相同的输出

我想你在第五行的意思是“2019-03-06”,所以我在我的虚拟数据部分修复了它

WITH
data AS (
  SELECT CAST('2019-02-06' as date) as EventDate, 4 as count union all
  SELECT CAST('2019-02-07' as date) as EventDate, 5 as count union all
  SELECT CAST('2019-02-12' as date) as EventDate, 7 as count union all
  SELECT CAST('2019-02-13' as date) as EventDate, 22 as count union all
  SELECT CAST('2019-03-06' as date) as EventDate, 2 as count
),
data_with_steps AS (
  SELECT *, 
    IF(DATE_DIFF(EventDate, LAG(EventDate) OVER (ORDER BY EventDate), day) > 2, 1, 0) as new_step
  FROM data
),
data_grouped AS (
  SELECT *, 
    SUM(new_step) OVER (ORDER BY EventDate) as step_group
  FROM data_with_steps
)
SELECT MIN(EventDate) as EventDate, sum(count) as count
FROM data_grouped
GROUP BY step_group
那么,它是如何工作的呢? 首先,我计算与前一天的日期差。如果超过2天,我将新列new_步骤的值设置为1,否则设置为0。 然后,我计算new_step列的累积和,并将其命名为step_group。 前两个步骤的输出为:

在最后一步,我将表按步骤分组,并将最小日期作为事件日期,将计数相加以获得组计数。

编辑: 为了添加其他事件而不按分组,我添加了一个新版本。 我认为最直观、最简单的方法就是使用Union All解决这个问题。 因此,您可以使用更新后的查询来包含其他事件,而无需分组

WITH
data AS (
  SELECT CAST('2019-02-06' as date) as EventDate, 'Sent' as EventType, 4 as count union all
  SELECT CAST('2019-02-07' as date) as EventDate, 'Sent' as EventType, 5 as count union all
  SELECT CAST('2019-02-12' as date) as EventDate, 'Sent' as EventType, 7 as count union all
  SELECT CAST('2019-02-13' as date) as EventDate, 'Sent' as EventType, 22 as count union all
  SELECT CAST('2019-03-06' as date) as EventDate, 'Sent' as EventType, 2 as count union all
  SELECT CAST('2019-02-12' as date) as EventDate, 'NotSent' as EventType, 7 as count union all
  SELECT CAST('2019-03-07' as date) as EventDate, 'NotSent' as EventType, 5 as count union all
  SELECT CAST('2019-02-13' as date) as EventDate, 'Bounces' as EventType, 22 as count union all
  SELECT CAST('2019-02-14' as date) as EventDate, 'Bounces' as EventType, 22 as count union all
  SELECT CAST('2019-03-12' as date) as EventDate, 'Bounces' as EventType, 7 as count
),
data_with_steps AS (
  SELECT *, 
    IF(DATE_DIFF(EventDate, LAG(EventDate) OVER (ORDER BY EventDate), day) > 2, 1, 0) as new_step
  FROM data
  WHERE EventType = 'Sent'
),
data_grouped AS (
  SELECT *, 
    SUM(new_step) OVER (ORDER BY EventDate) as step_group
  FROM data_with_steps
)
SELECT EventType, MIN(EventDate) as EventDate, sum(count) as count
FROM data_grouped
GROUP BY EventType, step_group

UNION ALL

SELECT EventType, EventDate, count
FROM data
WHERE EventType != 'Sent'

如何在第一个响应表中有2019-02-06的两行?我是遗漏了什么,还是它们只是同一天?@Sab感谢good spot,今天稍后我将尝试您的示例。是否有方法将EventType添加到结果中?如果是这样的话,我的最后一步是在结果中有其他事件类型,但只在“发送”上进行延迟。您能提供带有其他事件类型和预期输出的示例输入数据吗?我已经更新了问题,我还测试了将EventType“Sent”的WHERE移动到子查询,尽管这样做仍然只显示Sent类型,忽略了Bounces和NotSent。数据步骤为SELECT*,IFDATE\u DifferentiventDate,LAGEventDate超过ORDER BY EventDate,day>1,1,0作为新步骤,从数据中的EventType(如“发送”)添加了问题的新版本,因此您可以检查。看起来不错,我唯一添加的是在最终选择中添加了ORDER BY EventDate,我会再次检查数据,但最初看起来不错,非常感谢。