Sql 7天用户计数:大查询自连接以获取日期范围和计数?

Sql 7天用户计数:大查询自连接以获取日期范围和计数?,sql,google-bigquery,Sql,Google Bigquery,我的Google Firebase事件数据集成到BigQuery中,我试图从这里获取Firebase自动提供给我的信息之一:1天、7天、28天用户计数 一天的计算非常简单 SELECT "1-day" as period, events.event_date, count(distinct events.user_pseudo_id) as uid FROM `your_path.events_*` as events WHERE events.event_name = "ses

我的Google Firebase事件数据集成到BigQuery中,我试图从这里获取Firebase自动提供给我的信息之一:1天、7天、28天用户计数

一天的计算非常简单

SELECT
  "1-day" as period,
  events.event_date,
  count(distinct events.user_pseudo_id) as uid
FROM
  `your_path.events_*` as events
WHERE events.event_name = "session_start"
group by events.event_date
结果像

period   event_date  uid
1-day    20190609    5
1-day    20190610    7
1-day    20190611    5
1-day    20190612    7
1-day    20190613    37
1-day    20190614    73
1-day    20190615    52
1-day    20190616    36

但对我来说,当我试图计算在过去7天中每天有多少独立用户时,情况变得复杂起来 从上面的查询中,我知道我的20190616天的目标值将是142,通过过滤7天并按条件删除组

我尝试的解决方案是直接自连接(以及不会改变结果的变体)

现在,我知道我几乎没有设置任何连接条件,但如果有,我希望产生交叉连接和巨大的结果。相反,这种方法的计数是70,这比预期的要低得多。此外,我可以设置间隔2天,结果不会改变

很明显,我在这里做了一些非常错误的事情,但我也认为我做这件事的方式非常初级,必须有一种更聪明的方式来实现这一点

我已经检查过了,但是这里的显式交叉连接是与event_dim的,我不确定它的定义


根据评论建议,检查在提供的解决方案。 这个解决方案一开始似乎是合理的,但最近出现了一些问题。下面是使用COUNT(DISTINCT)的查询,我根据自己的情况进行了调整

SELECT DATE_SUB(event_date, INTERVAL i DAY) date_grp
 , COUNT(DISTINCT user_pseudo_id) unique_90_day_users
 , COUNT(DISTINCT IF(i<29,user_pseudo_id,null)) unique_28_day_users
 , COUNT(DISTINCT IF(i<8,user_pseudo_id,null)) unique_7_day_users
 , COUNT(DISTINCT IF(i<2,user_pseudo_id,null)) unique_1_day_users
FROM (
  SELECT PARSE_DATE("%Y%m%d",event_date) as event_date, user_pseudo_id
  FROM `your_path_here.events_*`
  WHERE EXTRACT(YEAR FROM PARSE_DATE("%Y%m%d",event_date))=2019
  GROUP BY 1, 2
), UNNEST(GENERATE_ARRAY(1, 90)) i
GROUP BY 1
ORDER BY date_grp
因此,在最后一天,90天、28天、7天的计算只考虑同一天,而不是之前的所有日子。
如果6月16日的1天更高,6月17日的90天计数不可能是78。

这是对我同样问题的回答。 我的方法是基本的,因为我不太熟悉BQ快捷方式和一些高级功能,但结果仍然是正确的。 我希望其他人能够集成更好的查询

#standardSQL
WITH dates AS (
  SELECT i as event_date
  FROM UNNEST(GENERATE_DATE_ARRAY('2019-05-24', CURRENT_DATE(), INTERVAL 1 DAY)) i
)
, ptd_dates as (
  SELECT DISTINCT "90-day" as day_category, FORMAT_DATE("%Y%m%d",event_date) AS event_date, FORMAT_DATE("%Y%m%d",DATE_SUB(event_date, INTERVAL i-1 DAY)) as ptd_date
  FROM dates,
    UNNEST(GENERATE_ARRAY(1, 90)) i
  UNION ALL
  SELECT distinct "28-day" as day_category, FORMAT_DATE("%Y%m%d",event_date) AS event_date, FORMAT_DATE("%Y%m%d",DATE_SUB(event_date, INTERVAL i-1 DAY)) as ptd_date
  FROM dates,
    UNNEST(GENERATE_ARRAY(1, 29)) i
  UNION ALL
  SELECT distinct "7-day" as day_category, FORMAT_DATE("%Y%m%d",event_date) AS event_date, FORMAT_DATE("%Y%m%d",DATE_SUB(event_date, INTERVAL i-1 DAY)) as ptd_date
  FROM dates,
    UNNEST(GENERATE_ARRAY(1, 7)) i
  UNION ALL
  SELECT distinct "1-day" as day_category, FORMAT_DATE("%Y%m%d",event_date) AS event_date, FORMAT_DATE("%Y%m%d",event_date) as ptd_date
  FROM dates
)


SELECT event_date,
  sum(IF(day_category="90-day",unique_ptd_users,null)) as count_90_day ,
  sum(IF(day_category="28-day",unique_ptd_users,null)) as count_28_day,
  sum(IF(day_category="7-day",unique_ptd_users,null)) as count_7_day,
  sum(IF(day_category="1-day",unique_ptd_users,null)) as count_1_day
from (
SELECT ptd_dates.day_category
  , ptd_dates.event_date
  , COUNT(DISTINCT user_pseudo_id) unique_ptd_users
FROM ptd_dates,
  `your_path_here.events_*` events,
  unnest(events.event_params) e_params
WHERE ptd_dates.ptd_date = events.event_date
GROUP BY ptd_dates.day_category
  , ptd_dates.event_date)
group by event_date
order by 1,2,3
根据ECris的建议,我首先定义了一个要使用的日历表:它包含4类PTD(截止日期)。每个都是从基本元素生成的:这应该线性扩展,因为它不查询事件数据集,因此没有间隙

然后,使用事件进行连接,其中连接条件显示如何在该期间的所有相关天中为每个日期统计不同的用户


结果是正确的。

您有日历\日期表可以使用吗?如果是这样的话,你可以跳过交叉连接。你有没有检查我尝试了Felipe链接的解决方案。不幸的是,我不能满足一些条件:不能确定在几天内不会有间隔,HLL_计数显示没有结果。将尽快发布编辑
row_num   date_grp     90-day  28-day  7-day   1-day
114       2019-06-16   273     273     273     210
115       2019-06-17   78      78      78      78
#standardSQL
WITH dates AS (
  SELECT i as event_date
  FROM UNNEST(GENERATE_DATE_ARRAY('2019-05-24', CURRENT_DATE(), INTERVAL 1 DAY)) i
)
, ptd_dates as (
  SELECT DISTINCT "90-day" as day_category, FORMAT_DATE("%Y%m%d",event_date) AS event_date, FORMAT_DATE("%Y%m%d",DATE_SUB(event_date, INTERVAL i-1 DAY)) as ptd_date
  FROM dates,
    UNNEST(GENERATE_ARRAY(1, 90)) i
  UNION ALL
  SELECT distinct "28-day" as day_category, FORMAT_DATE("%Y%m%d",event_date) AS event_date, FORMAT_DATE("%Y%m%d",DATE_SUB(event_date, INTERVAL i-1 DAY)) as ptd_date
  FROM dates,
    UNNEST(GENERATE_ARRAY(1, 29)) i
  UNION ALL
  SELECT distinct "7-day" as day_category, FORMAT_DATE("%Y%m%d",event_date) AS event_date, FORMAT_DATE("%Y%m%d",DATE_SUB(event_date, INTERVAL i-1 DAY)) as ptd_date
  FROM dates,
    UNNEST(GENERATE_ARRAY(1, 7)) i
  UNION ALL
  SELECT distinct "1-day" as day_category, FORMAT_DATE("%Y%m%d",event_date) AS event_date, FORMAT_DATE("%Y%m%d",event_date) as ptd_date
  FROM dates
)


SELECT event_date,
  sum(IF(day_category="90-day",unique_ptd_users,null)) as count_90_day ,
  sum(IF(day_category="28-day",unique_ptd_users,null)) as count_28_day,
  sum(IF(day_category="7-day",unique_ptd_users,null)) as count_7_day,
  sum(IF(day_category="1-day",unique_ptd_users,null)) as count_1_day
from (
SELECT ptd_dates.day_category
  , ptd_dates.event_date
  , COUNT(DISTINCT user_pseudo_id) unique_ptd_users
FROM ptd_dates,
  `your_path_here.events_*` events,
  unnest(events.event_params) e_params
WHERE ptd_dates.ptd_date = events.event_date
GROUP BY ptd_dates.day_category
  , ptd_dates.event_date)
group by event_date
order by 1,2,3