Python 从google analytics API v4下载批处理报告
我正在尝试获取一份为期3个月的报告,为此,我需要发出多个请求,并将结果附加到列表中,因为每个请求API只返回Python 从google analytics API v4下载批处理报告,python,google-analytics-api,google-reporting-api,Python,Google Analytics Api,Google Reporting Api,我正在尝试获取一份为期3个月的报告,为此,我需要发出多个请求,并将结果附加到列表中,因为每个请求API只返回100000行。从API返回一个名为nextPageToken的变量,我需要将该变量传递到下一个查询中,以获得报告的下一行100000。我很难做到这一点 这是我的密码: def initialize_analyticsreporting(): '''Initializes an Analytics Reporting API V4 service object. Retu
100000
行。从API返回一个名为nextPageToken
的变量,我需要将该变量传递到下一个查询中,以获得报告的下一行100000
。我很难做到这一点
这是我的密码:
def initialize_analyticsreporting():
'''Initializes an Analytics Reporting API V4 service object.
Returns:
An authorized Analytics Reporting API V4 service object.
'''
credentials = ServiceAccountCredentials.from_json_keyfile_name(
KEY_FILE_LOCATION, SCOPES)
# Build the service object.
analytics = build('analyticsreporting', 'v4', credentials=credentials)
return analytics
list = []
def get_report(analytics, pageTokenVariable):
return analytics.reports().batchGet(
body={
'reportRequests': [
{
'viewId': VIEW_ID,
'pageSize': 100000,
'dateRanges': [{'startDate': '90daysAgo', 'endDate': 'yesterday'}],
'metrics': [{'expression': 'ga:adClicks'}, {'expression': 'ga:impressions'}, {'expression': 'ga:adCost'}, {'expression': 'ga:CTR'}, {'expression': 'ga:CPC'}, {'expression': 'ga:costPerTransaction'}, {'expression': 'ga:transactions'}, {'expression': 'ga:transactionsPerSession'}, {'expression': 'ga:pageviews'}, {'expression': 'ga:timeOnPage'}],
"pageToken": pageTokenVariable,
'dimensions': [{'name': 'ga:adMatchedQuery'}, {'name': 'ga:campaign'}, {'name': 'ga:adGroup'}, {'name': 'ga:adwordsCustomerID'}, {'name': 'ga:date'}],
'orderBys': [{'fieldName': 'ga:impressions', 'sortOrder': 'DESCENDING'}],
'dimensionFilterClauses': [{
'filters': [{
'dimension_name': 'ga:adwordsCustomerID',
'operator': 'EXACT',
'expressions': 'abc',
'not': 'True'
}]
}],
'dimensionFilterClauses': [{
'filters': [{
'dimension_name': 'ga:adMatchedQuery',
'operator': 'EXACT',
'expressions': '(not set)',
'not': 'True'
}]
}]
}]
}
).execute()
analytics = initialize_analyticsreporting()
response = get_report(analytics, "0")
for report in response.get('reports', []):
pagetoken = report.get('nextPageToken', None)
print(pagetoken)
#------printing the pagetoken here returns `100,000` which is expected
columnHeader = report.get('columnHeader', {})
dimensionHeaders = columnHeader.get('dimensions', [])
metricHeaders = columnHeader.get(
'metricHeader', {}).get('metricHeaderEntries', [])
rows = report.get('data', {}).get('rows', [])
for row in rows:
# create dict for each row
dict = {}
dimensions = row.get('dimensions', [])
dateRangeValues = row.get('metrics', [])
# fill dict with dimension header (key) and dimension value (value)
for header, dimension in zip(dimensionHeaders, dimensions):
dict[header] = dimension
# fill dict with metric header (key) and metric value (value)
for i, values in enumerate(dateRangeValues):
for metric, value in zip(metricHeaders, values.get('values')):
# set int as int, float a float
if ',' in value or ',' in value:
dict[metric.get('name')] = float(value)
else:
dict[metric.get('name')] = float(value)
list.append(dict)
# Append that data to a list as a dictionary
# pagination function
while pagetoken: # This says while there is info in the nextPageToken get the data, process it and add to the list
response = get_report(analytics, pagetoken)
pagetoken = response['reports'][0]['nextPageToken']
print(pagetoken)
#------printing the pagetoken here returns `200,000` as is expected but the data being pulled is the same as for the first batch and so on. While in the loop the pagetoken is being incremented but it does not retrieve new data
for row in rows:
# create dict for each row
dict = {}
dimensions = row.get('dimensions', [])
dateRangeValues = row.get('metrics', [])
# fill dict with dimension header (key) and dimension value (value)
for header, dimension in zip(dimensionHeaders, dimensions):
dict[header] = dimension
# fill dict with metric header (key) and metric value (value)
for i, values in enumerate(dateRangeValues):
for metric, value in zip(metricHeaders, values.get('values')):
# set int as int, float a float
if ',' in value or ',' in value:
dict[metric.get('name')] = float(value)
else:
dict[metric.get('name')] = float(value)
list.append(dict)
df = pd.DataFrame(list)
print(df) # Append that data to a list as a dictionary
df.to_csv('full_dataset.csv', encoding="utf-8", index=False)
我试图传递pagetoken的错误在哪里
那么您正在更新
pagetoken=response['reports'][0]['nextPageToken']
中的pagetoken,但是您不应该也用新数据更新while循环中的行吗
像这样的
while pagetoken:
response = get_report(analytics, pagetoken)
pagetoken = response['reports'][0].get('nextPageToken')
for report in reponse.get('reports', []):
rows = report.get('data', {}).get('rows', [])
for row in rows:
请看这篇关于。您需要在报告中查询下一个getoken
,然后将其作为页面令牌
包含在后续请求中。无论您请求多少行,API每个请求最多返回100000行。我不知道,这就是我搜索答案的原因,文档没有正确解释它。它只声明更新pageToken
。添加了代码示例。无法测试它,但对我来说,行
需要更新,否则您将反复处理相同的数据。现在,我在response=get_report(analytics,pagetoken)
行中获得了关键错误:“nextPageToken'
。数据现在看起来是正确的,当nextPageToken
不再存在时,数据不会停止。但是为什么呢?我确实有while pagetoken
的条件。谢谢你,成功了!所以据我所知:问题是,即使我认为我有权nextPageToken
我也曾查询过同一份报告?对pagetoken
使用get
的区别在于,如果找不到它,它不会抛出错误?正确get
返回None
,而不是引发异常。