Sql 在Google BigQuery中查找用户的旅程
我想在一个特定的网站上找到用户的旅程。我的数据集的架构与Google商品商店相同,可在以下位置找到: 从GoogleBigQueryCookbook中,我实现并修改了提供的SQL代码,以获得每个客户的点击顺序Sql 在Google BigQuery中查找用户的旅程,sql,google-bigquery,Sql,Google Bigquery,我想在一个特定的网站上找到用户的旅程。我的数据集的架构与Google商品商店相同,可在以下位置找到: 从GoogleBigQueryCookbook中,我实现并修改了提供的SQL代码,以获得每个客户的点击顺序 SELECT fullVisitorId AS id, visitId AS visitid, visitNumber AS visitnumber, h.hitNumber AS hitNumber, CASE WHEN h.eventInfo.eventAc
SELECT
fullVisitorId AS id,
visitId AS visitid,
visitNumber AS visitnumber,
h.hitNumber AS hitNumber,
CASE
WHEN h.eventInfo.eventAction = "Lead" THEN "Lead"
WHEN h.eventInfo.eventAction = "Homepage" THEN "Homepage"
WHEN h.eventInfo.eventAction = "Search" THEN "Search"
WHEN h.eventInfo.eventAction = "High Intent Use" THEN "High Intent Use"
WHEN h.eventInfo.eventAction = "Listing Page" THEN "Listing Page"
END AS journey
FROM
`dataset`,
UNNEST(hits) AS h
WHERE
h.type="PAGE"
OR h.type="EVENT"
ORDER BY
fullVisitorId,
visitId,
visitNumber,
hitNumber
我得到的结果片段如下:
fullVisitorId visitId visitNumber hitnumber journey
001 1001 1 1 Homepage
001 1001 1 2 Search
001 1001 1 3 null
001 1001 1 4 Search
001 1001 1 5 Listing Page
001 1001 1 6 Lead
001 1001 1 2 Search
001 1001 1 7 Lead
002 1002 1 1 Search
...
我需要的是获得另一个列,显示每个访问者在第一个线索之前的旅程,同时忽略重复的内容,例如,如果访问者背靠背搜索5页,那么旅程应该只显示一次搜索
即,对于访问1001的访客001,该列将显示:
Homepage -> Search -> Listing Page -> Lead
我希望问题是清楚的。感谢您的帮助!: 我建议使用创建一系列旅程步骤,在您的选择中添加DISTINCT只会为每个用户显示一次单独的旅程步骤
比如:
字符串\u aggdistinctTourney,'->'作为倾向\u带\u子集
然后,您可以在第一个“潜在客户”之后使用一些正则表达式进行剪裁,除非有人能在原始字符串聚合中建议更好的方法来执行此操作?下面是针对BigQuery标准SQL的,并将额外的逻辑应用于您现有/当前的查询
#standardSQL
SELECT
fullVisitorId, visitId,
STRING_AGG(journey, ' -> ' ORDER BY visitNumber, hitnumber) journey_path
FROM (
SELECT
fullVisitorId, visitId,
MIN(visitNumber) visitNumber, MIN(hitnumber) hitnumber, journey
FROM (
SELECT *, COUNTIF(journey = 'Lead') OVER(win) grp
FROM `your_current_query`
WINDOW win AS (
PARTITION BY fullVisitorId, visitId
ORDER BY visitNumber, hitnumber
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
)
)
WHERE grp = 0
GROUP BY fullVisitorId, visitId, journey
)
GROUP BY fullVisitorId, visitId
因此,您可以使用现有的查询,如下所示
#standardSQL
WITH `your_current_query` AS (
SELECT
fullVisitorId AS id,
visitId AS visitid,
visitNumber AS visitnumber,
h.hitNumber AS hitNumber,
CASE
WHEN h.eventInfo.eventAction = "Lead" THEN "Lead"
WHEN h.eventInfo.eventAction = "Homepage" THEN "Homepage"
WHEN h.eventInfo.eventAction = "Search" THEN "Search"
WHEN h.eventInfo.eventAction = "High Intent Use" THEN "High Intent Use"
WHEN h.eventInfo.eventAction = "Listing Page" THEN "Listing Page"
END AS journey
FROM
`dataset`,
UNNEST(hits) AS h
WHERE
h.type="PAGE"
OR h.type="EVENT"
)
SELECT
fullVisitorId, visitId,
STRING_AGG(journey, ' -> ' ORDER BY visitNumber, hitnumber) journey_path
FROM (
SELECT
fullVisitorId, visitId,
MIN(visitNumber) visitNumber, MIN(hitnumber) hitnumber, journey
FROM (
SELECT *, COUNTIF(journey = 'Lead') OVER(win) grp
FROM `your_current_query`
WINDOW win AS (
PARTITION BY fullVisitorId, visitId
ORDER BY visitNumber, hitnumber
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
)
)
WHERE grp = 0
GROUP BY fullVisitorId, visitId, journey
)
GROUP BY fullVisitorId, visitId
--- ORDER BY fullVisitorId, visitId
如果要遵循您的结果示例-上面的应生成下面的结果
Row fullVisitorId visitId journey_path
1 001 1001 Homepage -> Search -> Listing Page -> Lead
2 002 1002 Search
我采用了Mikhails的优秀方法,为那些拥有大量数据的用户提供了一个更具伸缩性的版本。这个想法是相同的,但适用于hits数组上的子查询
SELECT
fullVisitorId AS id,
visitId AS visitid,
visitNumber AS visitnumber,
ARRAY(
(SELECT AS STRUCT *
FROM
(SELECT AS STRUCT
hitNumber,
page.pagePath, -- pagePath instead of CASE-WHEN with events
count(page.pagePath) over (win) elNumber
FROM t.hits
WHERE type IN ('PAGE', 'EVENT')
WINDOW win AS (
PARTITION BY page.pagePath
ORDER BY hitnumber
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
)
ORDER BY hitNumber)
WHERE elNumber=0
-- instead of 'Lead' I used '/signin.html'
AND hitNumber < (SELECT MIN(hitNumber) FROM t.hits WHERE page.pagePath='/signin.html')
)
) AS journey
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_20170801` t
limit 1000
我使用了实际的示例数据,但在那里找不到示例中的事件,所以我只使用了页面路径。但它应该很容易被采用
此外,此函数返回嵌套数据,而不是平面表,这在将结果保存为表时再次节省空间,并且在对其执行查询时速度更快
也不涉及分组-子查询中的所有事情都只发生在数组上,由于并行化,允许非常快速的处理 啊,谢谢@Ben,这帮助我走到了下一步!:我正在考虑将strpos与其他函数(可能是substr)结合起来进行剪辑。如果我能得到结果,将再次更新这里。STRING_AGG与DISTINCT一起不能保证正确的订单:使用DISTINCT,您只能通过旅程订购,而不能通过hitNumber订购,但是如果没有订单,则不保证任何订单。嗨@mikhail,谢谢您的回答!你能解释一下'CountifTravely='Lead'OVERwin grp'是做什么的吗?为什么要过滤'grp=0'?啊,好吧,我仔细研究了这个查询,发现'grp=0'捕获了第一个Lead之前的所有点击。总是从你身上学到新东西@Mikhail!谢谢你:当然可以。对的这是对每个潜在客户之前的所有事件进行分组。因为您只希望在第一个lead之前有事件-'grp=0'正好做到了这一点。很高兴你自己得到了这个答案:这是一个类似的问题: