Google analytics 谷歌分析:时间间隔内的转换率

Google analytics 谷歌分析:时间间隔内的转换率,google-analytics,google-bigquery,Google Analytics,Google Bigquery,我已经在一个网站上运行了谷歌分析,现在正试图确定特定时间间隔内的转化率。因此,我有一个包含 interval\u id i.interval\u start\u time\u utc i.interval\u stop\u time\u utc 遗憾的是,以下将每个订单分配给时间间隔的BigQuery查询将不起作用: SELECT totals.transactions, totals.visits, i.interval_id FROM [123456.ga_sessions_201606

我已经在一个网站上运行了谷歌分析,现在正试图确定特定时间间隔内的转化率。因此,我有一个包含

  • interval\u id
  • i.interval\u start\u time\u utc
  • i.interval\u stop\u time\u utc
遗憾的是,以下将每个订单分配给时间间隔的BigQuery查询将不起作用:

SELECT
totals.transactions,
totals.visits,
i.interval_id
FROM [123456.ga_sessions_20160609]
INNER JOIN intervals i ON i.interval_start_time_utc < visitStartTime AND visitStartTime < i.interval_end_time_utc

所以我推测BigQuery根本不做范围联接。除了做一个完整的连接,然后削减外,还有别的方法吗?对于这类事情有完全不同的、更好的方法吗?

BigQuery标准SQL没有这个限制-请参阅

如果你想使用BigQuery遗留SQL,试试下面的方法

SELECT
  totals.transactions,
  totals.visits,
  i.interval_id
FROM [123456.ga_sessions_20160609]
CROSS JOIN intervals i 
WHERE i.interval_start_time_utc < visitStartTime 
AND visitStartTime < i.interval_end_time_utc
选择
总计.交易,
总访问量,
i、 间隔id
来自[123456.ga_sessions_20160609]
交叉连接间隔i
其中i.interval\u start\u time\u utc
BigQuery标准SQL没有此限制-请参阅

如果你想使用BigQuery遗留SQL,试试下面的方法

SELECT
  totals.transactions,
  totals.visits,
  i.interval_id
FROM [123456.ga_sessions_20160609]
CROSS JOIN intervals i 
WHERE i.interval_start_time_utc < visitStartTime 
AND visitStartTime < i.interval_end_time_utc
选择
总计.交易,
总访问量,
i、 间隔id
来自[123456.ga_sessions_20160609]
交叉连接间隔i
其中i.interval\u start\u time\u utc
为了表达想法,让我们简化示例
让我们记住——我们确实希望使用BigQuery遗留SQL实现它——而不是使用标准SQL,因为它非常简单

挑战

假设我们有
访问
表:

SELECT visit_time FROM 
  (SELECT 2 AS visit_time),
  (SELECT 12 AS visit_time),
  (SELECT 22 AS visit_time),
  (SELECT 32 AS visit_time)
SELECT before, after, event FROM 
  (SELECT 1 AS before, 5 AS after, 3 AS event),
  (SELECT 6 AS before, 10 AS after, 8 AS event),
  (SELECT 21 AS before, 25 AS after, 23 AS event),
  (SELECT 33 AS before, 37 AS after, 35 AS event)
间隔
表:

SELECT visit_time FROM 
  (SELECT 2 AS visit_time),
  (SELECT 12 AS visit_time),
  (SELECT 22 AS visit_time),
  (SELECT 32 AS visit_time)
SELECT before, after, event FROM 
  (SELECT 1 AS before, 5 AS after, 3 AS event),
  (SELECT 6 AS before, 10 AS after, 8 AS event),
  (SELECT 21 AS before, 25 AS after, 23 AS event),
  (SELECT 33 AS before, 37 AS after, 35 AS event)
我们希望提取事件的
之前
之后
值中的所有访问

这可以简单地通过使用
交叉连接来完成,如下所示:

SELECT
  visit_time, event, before, after
FROM (
  SELECT visit_time FROM 
    (SELECT 2 AS visit_time),
    (SELECT 12 AS visit_time),
    (SELECT 22 AS visit_time),
    (SELECT 32 AS visit_time),
) AS visits
CROSS JOIN (
  SELECT before, after, event FROM 
    (SELECT 1 AS before, 5 AS after, 3 AS event),
    (SELECT 6 AS before, 10 AS after, 8 AS event),
    (SELECT 21 AS before, 25 AS after, 23 AS event),
    (SELECT 33 AS before, 37 AS after, 35 AS event)
) AS intervals
WHERE visit_time BETWEEN before AND after
结果如下:

visit_time  event   before  after    
2           3       1       5    
22          23      21      25  
潜在问题

当两个表都足够大时,这种交叉连接会变得非常昂贵

暗示


碰巧(根据用户的评论)-间隔始终是事件左侧和右侧的x单位

解决方案

下面是建议的解决方案/选项,它使用提示/事实,并在两个大表之间使用
JOIN
而不是
CROSS-JOIN

这里的关键是生成(动态)新表,该表将根据事件和x保存所有可能的间隔值

SELECT event, event + delta AS point 
FROM (
  SELECT event FROM
    (SELECT 1 AS before, 5 AS after, 3 AS event),
    (SELECT 6 AS before, 10 AS after, 8 AS event),
    (SELECT 21 AS before, 25 AS after, 23 AS event),
    (SELECT 33 AS before, 37 AS after, 35 AS event)
) AS events
CROSS JOIN (
  SELECT pos - 1 - 2 AS delta FROM (
       SELECT ROW_NUMBER() OVER() AS pos, * FROM (FLATTEN((
       SELECT SPLIT(RPAD('', 1 + 2 * 2, '.'),'') AS h FROM (SELECT NULL)),h
  )))   
) AS deltas
在上面的代码中,x=2–但是您可以在两个地方更改它,例如,如果x=5,您应该有

SELECT pos - 1 - 5 AS delta FROM (
     SELECT ROW_NUMBER() OVER() AS pos, * FROM (FLATTEN((
     SELECT SPLIT(RPAD('', 1 + 2 * 5, '.'),'') AS h FROM (SELECT NULL)),h
)))   
上面代码中的交叉连接是便宜的,因为Delta表非常小

因此,最后,现在,您可以通过以下方式获得您的结果:

SELECT
  visit_time, event 
FROM (
  SELECT visit_time FROM 
    (SELECT 2 AS visit_time),
    (SELECT 12 AS visit_time),
    (SELECT 22 AS visit_time),
    (SELECT 32 AS visit_time),
) AS visits
JOIN (
  SELECT event, event + delta AS point 
  FROM (
    SELECT event FROM
      (SELECT 1 AS before, 5 AS after, 3 AS event),
      (SELECT 6 AS before, 10 AS after, 8 AS event),
      (SELECT 21 AS before, 25 AS after, 23 AS event),
      (SELECT 33 AS before, 37 AS after, 35 AS event)
  ) AS events
  CROSS JOIN (
    SELECT pos - 1 - 2 AS delta FROM (
         SELECT ROW_NUMBER() OVER() AS pos, * FROM (FLATTEN((
         SELECT SPLIT(RPAD('', 1 + 2 * 2, '.'),'') AS h FROM (SELECT NULL)),h
    )))   
  ) AS deltas
) AS points
ON points.point = visits.visit_time
预期结果

visit_time  event    
2           3    
22          23  
我认为上述方法适合您,但您确实需要将其应用于您的具体情况
我认为,如果你能将所有涉及的时间四舍五入到各自的分钟内,这可以相对容易地完成

希望这会有所帮助

如果您将获得这项工作,请与我们分享结果:o)

为了展示想法,让我们简化示例
让我们记住——我们确实希望使用BigQuery遗留SQL实现它——而不是使用标准SQL,因为它非常简单

挑战

假设我们有
访问
表:

SELECT visit_time FROM 
  (SELECT 2 AS visit_time),
  (SELECT 12 AS visit_time),
  (SELECT 22 AS visit_time),
  (SELECT 32 AS visit_time)
SELECT before, after, event FROM 
  (SELECT 1 AS before, 5 AS after, 3 AS event),
  (SELECT 6 AS before, 10 AS after, 8 AS event),
  (SELECT 21 AS before, 25 AS after, 23 AS event),
  (SELECT 33 AS before, 37 AS after, 35 AS event)
间隔
表:

SELECT visit_time FROM 
  (SELECT 2 AS visit_time),
  (SELECT 12 AS visit_time),
  (SELECT 22 AS visit_time),
  (SELECT 32 AS visit_time)
SELECT before, after, event FROM 
  (SELECT 1 AS before, 5 AS after, 3 AS event),
  (SELECT 6 AS before, 10 AS after, 8 AS event),
  (SELECT 21 AS before, 25 AS after, 23 AS event),
  (SELECT 33 AS before, 37 AS after, 35 AS event)
我们希望提取事件的
之前
之后
值中的所有访问

这可以简单地通过使用
交叉连接来完成,如下所示:

SELECT
  visit_time, event, before, after
FROM (
  SELECT visit_time FROM 
    (SELECT 2 AS visit_time),
    (SELECT 12 AS visit_time),
    (SELECT 22 AS visit_time),
    (SELECT 32 AS visit_time),
) AS visits
CROSS JOIN (
  SELECT before, after, event FROM 
    (SELECT 1 AS before, 5 AS after, 3 AS event),
    (SELECT 6 AS before, 10 AS after, 8 AS event),
    (SELECT 21 AS before, 25 AS after, 23 AS event),
    (SELECT 33 AS before, 37 AS after, 35 AS event)
) AS intervals
WHERE visit_time BETWEEN before AND after
结果如下:

visit_time  event   before  after    
2           3       1       5    
22          23      21      25  
潜在问题

当两个表都足够大时,这种交叉连接会变得非常昂贵

暗示


碰巧(根据用户的评论)-间隔始终是事件左侧和右侧的x单位

解决方案

下面是建议的解决方案/选项,它使用提示/事实,并在两个大表之间使用
JOIN
而不是
CROSS-JOIN

这里的关键是生成(动态)新表,该表将根据事件和x保存所有可能的间隔值

SELECT event, event + delta AS point 
FROM (
  SELECT event FROM
    (SELECT 1 AS before, 5 AS after, 3 AS event),
    (SELECT 6 AS before, 10 AS after, 8 AS event),
    (SELECT 21 AS before, 25 AS after, 23 AS event),
    (SELECT 33 AS before, 37 AS after, 35 AS event)
) AS events
CROSS JOIN (
  SELECT pos - 1 - 2 AS delta FROM (
       SELECT ROW_NUMBER() OVER() AS pos, * FROM (FLATTEN((
       SELECT SPLIT(RPAD('', 1 + 2 * 2, '.'),'') AS h FROM (SELECT NULL)),h
  )))   
) AS deltas
在上面的代码中,x=2–但是您可以在两个地方更改它,例如,如果x=5,您应该有

SELECT pos - 1 - 5 AS delta FROM (
     SELECT ROW_NUMBER() OVER() AS pos, * FROM (FLATTEN((
     SELECT SPLIT(RPAD('', 1 + 2 * 5, '.'),'') AS h FROM (SELECT NULL)),h
)))   
上面代码中的交叉连接是便宜的,因为Delta表非常小

因此,最后,现在,您可以通过以下方式获得您的结果:

SELECT
  visit_time, event 
FROM (
  SELECT visit_time FROM 
    (SELECT 2 AS visit_time),
    (SELECT 12 AS visit_time),
    (SELECT 22 AS visit_time),
    (SELECT 32 AS visit_time),
) AS visits
JOIN (
  SELECT event, event + delta AS point 
  FROM (
    SELECT event FROM
      (SELECT 1 AS before, 5 AS after, 3 AS event),
      (SELECT 6 AS before, 10 AS after, 8 AS event),
      (SELECT 21 AS before, 25 AS after, 23 AS event),
      (SELECT 33 AS before, 37 AS after, 35 AS event)
  ) AS events
  CROSS JOIN (
    SELECT pos - 1 - 2 AS delta FROM (
         SELECT ROW_NUMBER() OVER() AS pos, * FROM (FLATTEN((
         SELECT SPLIT(RPAD('', 1 + 2 * 2, '.'),'') AS h FROM (SELECT NULL)),h
    )))   
  ) AS deltas
) AS points
ON points.point = visits.visit_time
预期结果

visit_time  event    
2           3    
22          23  
我认为上述方法适合您,但您确实需要将其应用于您的具体情况
我认为,如果你能将所有涉及的时间四舍五入到各自的分钟内,这可以相对容易地完成

希望这会有所帮助

如果你想得到这项工作,请与我们分享结果:o)

我有大约5000个这样的间隔,所以交叉连接是一个非常昂贵的操作。我有什么不同的方法来处理这个问题吗?如果您可以使用BigQuery标准SQL来处理这个特定的查询,请参阅我答案中的第一条语句!如果您坚持使用BigQuery遗留SQl,那么选项就没有那么多了——但仍然有机会——如果您的时间间隔中有一些spedicif确定性模式,允许将非等联接转换为等联接。让我们稍微了解一下你的间歇背后的逻辑,这样我们就可以进一步帮助你了。间歇时间总是在某个事件的左边和右边的x分钟。本质上,我感兴趣的是转换率是如何随变化而变化的,我想从变化的时间邻域来研究这一点。这有帮助吗?我认为这个额外的信息/逻辑可以帮助将交叉连接转换为BQ遗留SQL的内部连接。当我有时间的时候,我可以花一点时间来做这件事——请确认你仍然在这个方向上寻找解决方案,所以如果你没有,我不会浪费我的时间。我有大约5000个这样的间隔,所以交叉连接是一个相当昂贵的操作。有没有其他方法可以实现这一点?如果您可以使用BigQuery标准SQL进行此特定查询,请参阅第一个st