Google bigquery 计算一段时间内每个用户最常触发的事件

Google bigquery 计算一段时间内每个用户最常触发的事件,google-bigquery,Google Bigquery,我正在从一个网站上提取一个用户ID列表,这些用户ID在过去几周内触发了特定的Google Analytics事件。我目前正在处理下面的查询,每个用户ID、每个事件标签返回一行 我想通过计算在此期间每个用户ID最常点击的事件,并仅返回此事件,而不是为其触发的所有事件来增强此查询。有谁能提出一个实现这一目标的好方法吗 SELECT customDimension.value AS UserID, hits.eventinfo.eventAction AS Size FROM `*.ga_sessio

我正在从一个网站上提取一个用户ID列表,这些用户ID在过去几周内触发了特定的Google Analytics事件。我目前正在处理下面的查询,每个用户ID、每个事件标签返回一行

我想通过计算在此期间每个用户ID最常点击的事件,并仅返回此事件,而不是为其触发的所有事件来增强此查询。有谁能提出一个实现这一目标的好方法吗

SELECT customDimension.value AS UserID, hits.eventinfo.eventAction AS Size
FROM `*.ga_sessions_*` AS t
  CROSS JOIN UNNEST(hits) AS hits
  CROSS JOIN UNNEST(t.customdimensions) AS customDimension
WHERE (_TABLE_SUFFIX BETWEEN '20170601' AND '20170628')
AND (hits.page.pagePath LIKE "%/shorts%" OR hits.page.pagePath LIKE "%/t-shirts%")
AND hits.eventinfo.eventCategory = "SIZE Filter Click"
AND (hits.eventinfo.eventAction = "S" OR hits.eventinfo.eventAction = "M" OR hits.eventinfo.eventAction = "L" OR hits.eventinfo.eventAction = "XL")
AND customDimension.index = 2
GROUP BY UserID, Size

我认为这可能对你有用:

WITH data AS(
  select ARRAY<STRUCT<index INT64, value STRING>> [STRUCT(NULL as index, '' as value), STRUCT(0 as index, 'test' as value), STRUCT(2 as index, 'user_1' as value)] customDimensions,  ARRAY<STRUCT<page STRUCT<pagePath STRING>, eventinfo STRUCT<eventcategory STRING, eventaction STRING> >> [STRUCT(STRUCT('/home' as pagePath) as page, STRUCT("cat1" as eventcategory, "act1" as eventaction) as eventinfo), STRUCT(STRUCT('/abcshortsabc' as pagePath) as page, STRUCT("SIZE Filter Click" as eventcategory, "S" as eventaction) as eventinfo), STRUCT(STRUCT('/abcshortsabc' as pagePath) as page, STRUCT("SIZE Filter Click" as eventcategory, "S" as eventaction) as eventinfo), STRUCT(STRUCT('/abcshortsabc' as pagePath) as page, STRUCT("SIZE Filter Click" as eventcategory, "M" as eventaction) as eventinfo), STRUCT(STRUCT('/abc/t-shirtsabc' as pagePath) as page, STRUCT("SIZE Filter Click" as eventcategory, "L" as eventaction) as eventinfo)] hits union all
  select ARRAY<STRUCT<index INT64, value STRING>> [STRUCT(2 as index, 'user_2' as value)] customDimensions,  ARRAY<STRUCT<page STRUCT<pagePath STRING>, eventinfo STRUCT<eventcategory STRING, eventaction STRING> >> [STRUCT(STRUCT('shorts' as pagePath) as page, STRUCT("cat1" as eventcategory, "act1" as eventaction) as eventinfo), STRUCT(STRUCT('/abcshortsabc' as pagePath) as page, STRUCT("SIZE Filter Click" as eventcategory, "M" as eventaction) as eventinfo), STRUCT(STRUCT('/abcshortsabc' as pagePath) as page, STRUCT("SIZE Filter Click" as eventcategory, "M" as eventaction) as eventinfo), STRUCT(STRUCT('/abcshortsabc' as pagePath) as page, STRUCT("SIZE Filter Click" as eventcategory, "M" as eventaction) as eventinfo), STRUCT(STRUCT('/abc/t-shirtsabc' as pagePath) as page, STRUCT("SIZE Filter Click" as eventcategory, "L" as eventaction) as eventinfo)] hits
)

SELECT 
  (SELECT value FROM UNNEST(customDimensions) WHERE index = 2 GROUP BY value) UserID,
  (SELECT eventinfo.eventaction size FROM UNNEST(hits) WHERE (REGEXP_CONTAINS(page.pagepath, r'shorts') OR REGEXP_CONTAINS(page.pagepath, r'/t-shirts')) AND eventinfo.eventcategory = 'SIZE Filter Click' GROUP BY eventinfo.eventaction ORDER BY COUNT(eventinfo.eventaction) DESC LIMIT 1) most_clicked_size
FROM data

其中数据是实际ga会话数据的模拟。

谢谢Will,这看起来很棒,我如何调整它以显示日期范围内的所有用户ID?不太清楚检索日期范围内的用户ID是什么意思,但在ga_会话中,您可以在SELECT语句以及WHERE筛选子句中使用日期字段。
SELECT 
  (SELECT value FROM UNNEST(customDimensions) WHERE index = 2 GROUP BY value) UserID,
  (SELECT AS STRUCT eventinfo.eventaction size, count(1) freq FROM UNNEST(hits) WHERE (REGEXP_CONTAINS(page.pagepath, r'shorts') OR REGEXP_CONTAINS(page.pagepath, r'/t-shirts')) AND eventinfo.eventcategory = 'SIZE Filter Click' GROUP BY eventinfo.eventaction ORDER BY COUNT(eventinfo.eventaction) DESC LIMIT 1) most_clicked_size
FROM data