Google cloud platform 替换Bigquery中原始Google分析会话数据的流量源?

Google cloud platform 替换Bigquery中原始Google分析会话数据的流量源?,google-cloud-platform,google-analytics,google-bigquery,Google Cloud Platform,Google Analytics,Google Bigquery,最近我们观察到,当用户试图使用ios设备在我们的网站上完成交易时。Apple结束当前会话并开始新会话。这样做的困难在于,如果用户通过付费来源/电子邮件访问,则当前会话将结束,并使用apple.com流量来源启动新会话 For Instance google->appleid.apple.com

最近我们观察到,当用户试图使用ios设备在我们的网站上完成交易时。Apple结束当前会话并开始新会话。这样做的困难在于,如果用户通过付费来源/电子邮件访问,则当前会话将结束,并使用apple.com流量来源启动新会话

For Instance                                                         
google->appleid.apple.com                                                      
(direct)->appleid.apple.com                                              
email->appleid.apple.com                                                
ios->appleid.apple.com->appleid.apple.com->appleid.apple.com             
由于我们有这些原始数据进入BQ,我们正在考虑用他们的实际流量来源,即google、direct、email和ios,来取代appleid.apple.com。 关于解决此问题的逻辑/功能的任何帮助都会有所帮助吗

这是我尝试实现的代码:

WITH DATA AS (
SELECT 
PARSE_DATE("%Y%m%d",date) AS Date,
clientId as ClientId,
fullVisitorId AS fullvisitorid,
visitNumber  AS visitnumber,
trafficSource.medium as medium,
CONCAT(fullvisitorid,"-",CAST(visitStartTime AS STRING)) AS Session_ID,
trafficsource.source AS Traffic_Source,
MAX((CASE WHEN (hits.eventInfo.eventLabel="complete") THEN 1 ELSE 0 END)) AS ConversionComplete
FROM `project.dataset.ga_sessions_20*` 
,UNNEST(hits) AS hits
WHERE totals.visits=1 
GROUP BY 
1,2,3,4,5,6,7
),
Source_Replace AS (
SELECT 
Date AS Date,
IF(Traffic_Source LIKE "%apple.com" ,(CASE WHEN Traffic_Source NOT LIKE "%apple.com%" THEN LAG(Traffic_Source,1) OVER (PARTITION BY ClientId ORDER BY visitnumber ASC)end), Traffic_Source) AS traffic_source_1,
medium AS Medium,
fullvisitorid AS User_ID,
Session_ID AS SessionID,
ConversionComplete AS ConversionComplete
FROM 
DATA
)
SELECT 
Date AS Date,
traffic_source_1 AS TrafficSource,
Medium AS TrafficMedium,
COUNT(DISTINCT User_ID) AS Users,
COUNT(DISTINCT SessionID) AS Sessions,
SUM(ConversionComplete) AS ConversionComplete
FROM 
Source_Replace 
GROUP BY 
1,2,3

谢谢

假设
访问开始时间
是识别会话开始帮助的关键吗?可能是这样的:

source_replaced as (
   select *, 
      min(Traffic_Source) over (
         partition by date, clientid, fullvisitorid, visitnumber order by visitStartTime
      ) as originating_source
   from data
)
然后您可以在
源代码上进行聚合。如果不看一些关于正在发生的事情的数据样本,这有点困难

希望能有帮助