使用R/rga从GA获取完整的未采样数据
我使用skardhamar的rga ga$getData来查询ga,并以未采样的方式获取所有数据。这些数据基于每天超过500k次的会话 在“提取超过10000个观察值”一段中提到,使用batch=TRUE可以实现这一点。另外,“获取未采样数据”一段提到,通过遍历几天,您可以获取未采样数据。我正试图将这两者结合起来,但我无法让它发挥作用。例如使用R/rga从GA获取完整的未采样数据,r,google-analytics,google-analytics-api,R,Google Analytics,Google Analytics Api,我使用skardhamar的rga ga$getData来查询ga,并以未采样的方式获取所有数据。这些数据基于每天超过500k次的会话 在“提取超过10000个观察值”一段中提到,使用batch=TRUE可以实现这一点。另外,“获取未采样数据”一段提到,通过遍历几天,您可以获取未采样数据。我正试图将这两者结合起来,但我无法让它发挥作用。例如 ga$getData(xxx, start.date = "2015-03-30", end.date = "2015-03-31",
ga$getData(xxx,
start.date = "2015-03-30",
end.date = "2015-03-31",
metrics = "ga:totalEvents",
dimensions = "ga:date,ga:customVarValue4,ga:eventCategory,ga:eventAction,ga:eventLabel",
sort = "",
filters = "",
segment = "",
,batch = TRUE, walk = TRUE
)
。。确实会获取未采样的数据,但不是所有数据。我得到的数据帧只有20k行(每天10k)。这将限制为每天10k的块,与我预期的相反,因为使用batch=TRUE设置。因此,对于3月30日,在看到以下输出后,我得到了一个20k行的数据帧:
Run (1/2): for date 2015-03-30
Pulling 10000 observations in batches of 10000
Run (1/1): observations [1;10000]. Batch size: 10000
Received: 10000 observations
Received: 10000 observations
Run (2/2): for date 2015-03-31
Pulling 10000 observations in batches of 10000
Run (1/1): observations [1;10000]. Batch size: 10000
Received: 10000 observations
Received: 10000 observations
当我忽略walk=TRUE设置时,我确实获得了所有观察结果(771k行,每天大约335k),但仅以抽样方式:
ga$getData(xxx,
start.date = "2015-03-30",
end.date = "2015-03-31",
metrics = "ga:totalEvents",
dimensions = "ga:date,ga:customVarValue4,ga:eventCategory,ga:eventAction,ga:eventLabel",
sort = "",
filters = "",
segment = "",
,batch = TRUE
)
Notice: Data set contains sampled data
Pulling 771501 observations in batches of 10000
Run (1/78): observations [1;10000]. Batch size: 10000
Notice: Data set contains sampled data
...
是否我的数据太大,无法对所有观察结果进行采样?您可以尝试使用filters=“ga:deviceCategory==desktop”(和filters=“ga:deviceCategory!=desktop”)按设备进行查询,然后合并生成的数据帧 我假设您的用户使用不同的设备访问您的站点。基本逻辑是,当你过滤数据时,Google Analytics Server会在你得到它之前过滤它,所以你可以“分割”你的查询并得到未采样的数据。我认为“walk”函数的方法学是相同的 仅桌面 手机和平板电脑
ga$getData(xxx,
start.date = "2015-03-30",
end.date = "2015-03-31",
metrics = "ga:totalEvents",
dimensions = "ga:date,ga:customVarValue4,ga:eventCategory,ga:eventAction,ga:eventLabel",
sort = "",
filters = "ga:deviceCategory==desktop",
segment = "",
,batch = TRUE, walk = TRUE
)
ga$getData(xxx,
start.date = "2015-03-30",
end.date = "2015-03-31",
metrics = "ga:totalEvents",
dimensions = "ga:date,ga:customVarValue4,ga:eventCategory,ga:eventAction,ga:eventLabel",
sort = "",
filters = "ga:deviceCategory!=desktop",
segment = "",
,batch = TRUE, walk = TRUE
)