优化动态7天队列Firebase BigQuery

优化动态7天队列Firebase BigQuery,firebase,google-bigquery,firebase-analytics,http-status-code-400,Firebase,Google Bigquery,Firebase Analytics,Http Status Code 400,我在下面针对我们的移动应用程序的数据编写了以下查询。由于用户基数较高,我在查询执行期间收到了超过400个请求错误资源:当我在底部添加ORDER BY时,查询无法在分配的内存中执行 问:我可以做些什么来优化查询,但仍然保留底部的ORDER BY 我已经添加了firebase的演示数据集,但我认为他们的数据集太小,与我的500-1000万条记录大的数据集相比不会有问题 SELECT f.user_pseudo_id, f.event_timestamp, DATE(TIMESTAMP

我在下面针对我们的移动应用程序的数据编写了以下查询。由于用户基数较高,我在查询执行期间收到了超过400个请求错误资源:当我在底部添加ORDER BY时,查询无法在分配的内存中执行

问:我可以做些什么来优化查询,但仍然保留底部的ORDER BY

我已经添加了firebase的演示数据集,但我认为他们的数据集太小,与我的500-1000万条记录大的数据集相比不会有问题

SELECT 
  f.user_pseudo_id,
  f.event_timestamp, 
  DATE(TIMESTAMP_MICROS(f.event_timestamp)) as event_timestamp_date,
  f.event_name,
  f.user_first_touch_timestamp,
  DATE(TIMESTAMP_MICROS(f.user_first_touch_timestamp)) as user_first_touch_date,
  CASE WHEN r.has_appRemove >= 1 THEN "removed" ELSE "not-removed" END AS status_after_first7days
FROM `firebase-analytics-sample-data.ios_dataset.app_events_*` f
LEFT JOIN (
    SELECT user_pseudo_id, 1 has_appRemove
    FROM `firebase-analytics-sample-data.ios_dataset.app_events_*`
    WHERE DATE(TIMESTAMP_MICROS(user_first_touch_timestamp)) >= DATE_SUB(CURRENT_DATE(), INTERVAL 10 DAY)
      AND DATE(TIMESTAMP_MICROS(user_first_touch_timestamp)) < DATE_SUB(CURRENT_DATE(), INTERVAL 9 DAY)
      AND _TABLE_SUFFIX >= FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 10 DAY))
      AND _TABLE_SUFFIX < FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 3 DAY))
      AND platform = "ANDROID"
      AND event_name = "app_remove"
    GROUP BY user_pseudo_id
    ) r on f.user_pseudo_id = r.user_pseudo_id
WHERE
  DATE(TIMESTAMP_MICROS(user_first_touch_timestamp)) >= DATE_SUB(CURRENT_DATE(), INTERVAL 10 DAY)
  AND DATE(TIMESTAMP_MICROS(user_first_touch_timestamp)) < DATE_SUB(CURRENT_DATE(), INTERVAL 9 DAY)
  AND _TABLE_SUFFIX >= FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 10 DAY))
  AND _TABLE_SUFFIX < FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 3 DAY))
  AND platform = "ANDROID" 
ORDER BY 1,2 ASC

您可以应用窗口/分析功能,而不是像下面未测试的示例中那样加入

#standardSQL
SELECT 
  user_pseudo_id,
  event_timestamp, 
  DATE(TIMESTAMP_MICROS(event_timestamp)) AS event_timestamp_date,
  event_name,
  user_first_touch_timestamp,
  DATE(TIMESTAMP_MICROS(user_first_touch_timestamp)) AS user_first_touch_date,
  COUNTIF(event_name = "app_remove") OVER(PARTITION BY user_pseudo_id) > 0 isRemoved
FROM `firebase-analytics-sample-data.ios_dataset.app_events_*` 
WHERE
  DATE(TIMESTAMP_MICROS(user_first_touch_timestamp)) >= DATE_SUB(CURRENT_DATE(), INTERVAL 10 DAY)
  AND DATE(TIMESTAMP_MICROS(user_first_touch_timestamp)) < DATE_SUB(CURRENT_DATE(), INTERVAL 9 DAY)
  AND _TABLE_SUFFIX >= FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 10 DAY))
  AND _TABLE_SUFFIX < FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 3 DAY))
  AND platform = "ANDROID" 
ORDER BY 1,2 ASC

您可以应用窗口/分析功能,而不是像下面未测试的示例中那样加入

#standardSQL
SELECT 
  user_pseudo_id,
  event_timestamp, 
  DATE(TIMESTAMP_MICROS(event_timestamp)) AS event_timestamp_date,
  event_name,
  user_first_touch_timestamp,
  DATE(TIMESTAMP_MICROS(user_first_touch_timestamp)) AS user_first_touch_date,
  COUNTIF(event_name = "app_remove") OVER(PARTITION BY user_pseudo_id) > 0 isRemoved
FROM `firebase-analytics-sample-data.ios_dataset.app_events_*` 
WHERE
  DATE(TIMESTAMP_MICROS(user_first_touch_timestamp)) >= DATE_SUB(CURRENT_DATE(), INTERVAL 10 DAY)
  AND DATE(TIMESTAMP_MICROS(user_first_touch_timestamp)) < DATE_SUB(CURRENT_DATE(), INTERVAL 9 DAY)
  AND _TABLE_SUFFIX >= FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 10 DAY))
  AND _TABLE_SUFFIX < FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 3 DAY))
  AND platform = "ANDROID" 
ORDER BY 1,2 ASC

这会产生多少行?你能用一个极限吗?嗨,Elliot,很遗憾我不能应用这个限制,因为我每天都使用这个查询通过bigquery任务调度器将数据写入长期表。我使用了Mikhail的分区答案,它做到了:太好了!我很高兴米哈伊尔的回答对你有用。请注意,如果要将结果写入另一个表,则orderby没有意义,因为表不保留顺序;只有查询结果可以。这会生成多少行?你能用一个极限吗?嗨,Elliot,很遗憾我不能应用这个限制,因为我每天都使用这个查询通过bigquery任务调度器将数据写入长期表。我使用了Mikhail的分区答案,它做到了:太好了!我很高兴米哈伊尔的回答对你有用。请注意,如果要将结果写入另一个表,则orderby没有意义,因为表不保留顺序;只有查询结果可以。