使用R/rga从GA获取完整的未采样数据

使用R/rga从GA获取完整的未采样数据,r,google-analytics,google-analytics-api,R,Google Analytics,Google Analytics Api,我使用skardhamar的rga ga$getData来查询ga,并以未采样的方式获取所有数据。这些数据基于每天超过500k次的会话 在“提取超过10000个观察值”一段中提到,使用batch=TRUE可以实现这一点。另外,“获取未采样数据”一段提到,通过遍历几天,您可以获取未采样数据。我正试图将这两者结合起来,但我无法让它发挥作用。例如 ga$getData(xxx, start.date = "2015-03-30", end.date = "2015-03-31",

我使用skardhamar的rga ga$getData来查询ga,并以未采样的方式获取所有数据。这些数据基于每天超过500k次的会话

在“提取超过10000个观察值”一段中提到,使用batch=TRUE可以实现这一点。另外,“获取未采样数据”一段提到,通过遍历几天,您可以获取未采样数据。我正试图将这两者结合起来,但我无法让它发挥作用。例如

ga$getData(xxx,
    start.date = "2015-03-30", 
    end.date = "2015-03-31",
    metrics = "ga:totalEvents", 
    dimensions = "ga:date,ga:customVarValue4,ga:eventCategory,ga:eventAction,ga:eventLabel", 
    sort = "", 
    filters = "", 
    segment = "",
    ,batch = TRUE, walk = TRUE
    )
。。确实会获取未采样的数据,但不是所有数据。我得到的数据帧只有20k行(每天10k)。这将限制为每天10k的块,与我预期的相反,因为使用batch=TRUE设置。因此,对于3月30日,在看到以下输出后,我得到了一个20k行的数据帧:

Run (1/2): for date 2015-03-30
Pulling 10000 observations in batches of 10000
Run (1/1): observations [1;10000]. Batch size: 10000
Received: 10000 observations
Received: 10000 observations
Run (2/2): for date 2015-03-31
Pulling 10000 observations in batches of 10000
Run (1/1): observations [1;10000]. Batch size: 10000
Received: 10000 observations
Received: 10000 observations
当我忽略walk=TRUE设置时,我确实获得了所有观察结果(771k行,每天大约335k),但仅以抽样方式:

ga$getData(xxx,
   start.date = "2015-03-30", 
   end.date = "2015-03-31",
   metrics = "ga:totalEvents", 
   dimensions = "ga:date,ga:customVarValue4,ga:eventCategory,ga:eventAction,ga:eventLabel", 
   sort = "", 
   filters = "", 
   segment = "",
   ,batch = TRUE
   )

Notice: Data set contains sampled data
Pulling 771501 observations in batches of 10000
Run (1/78): observations [1;10000]. Batch size: 10000
Notice: Data set contains sampled data
...

是否我的数据太大,无法对所有观察结果进行采样?

您可以尝试使用filters=“ga:deviceCategory==desktop”(和filters=“ga:deviceCategory!=desktop”)按设备进行查询,然后合并生成的数据帧

我假设您的用户使用不同的设备访问您的站点。基本逻辑是,当你过滤数据时,Google Analytics Server会在你得到它之前过滤它,所以你可以“分割”你的查询并得到未采样的数据。我认为“walk”函数的方法学是相同的

仅桌面 手机和平板电脑
ga$getData(xxx,
start.date = "2015-03-30", 
end.date = "2015-03-31",
metrics = "ga:totalEvents", 
dimensions = "ga:date,ga:customVarValue4,ga:eventCategory,ga:eventAction,ga:eventLabel", 
sort = "", 
filters = "ga:deviceCategory==desktop", 
segment = "",
,batch = TRUE, walk = TRUE
)
ga$getData(xxx,
start.date = "2015-03-30", 
end.date = "2015-03-31",
metrics = "ga:totalEvents", 
dimensions = "ga:date,ga:customVarValue4,ga:eventCategory,ga:eventAction,ga:eventLabel", 
sort = "", 
filters = "ga:deviceCategory!=desktop", 
segment = "",
,batch = TRUE, walk = TRUE
)