Python Google BigQuery对奇数尝试的不完整查询答复_Python_Google Bigquery_Google Api Python Client

Python Google BigQuery对奇数尝试的不完整查询答复

python google-bigquery

Python Google BigQuery对奇数尝试的不完整查询答复,python,google-bigquery,google-api-python-client,Python,Google Bigquery,Google Api Python Client,使用以下命令通过python api查询BigQuery时： service.jobs().getQueryResults 我们发现第一次尝试效果很好-所有预期结果都包含在响应中。但是，如果在第一次之后不久（大约在5分钟内）再次运行查询，则几乎立即返回一小部分结果（以2的幂为单位），并且没有错误请参阅我们的完整代码：有什么想法会导致这种情况吗？问题似乎是我们为query（）和getQueryResults（）返回了不同的默认行数。因此，根据查询是否快速完成（因此不必使用getQueryR

使用以下命令通过python api查询BigQuery时：

service.jobs().getQueryResults

我们发现第一次尝试效果很好-所有预期结果都包含在响应中。但是，如果在第一次之后不久（大约在5分钟内）再次运行查询，则几乎立即返回一小部分结果（以2的幂为单位），并且没有错误

请参阅我们的完整代码：

有什么想法会导致这种情况吗？

问题似乎是我们为query（）和getQueryResults（）返回了不同的默认行数。因此，根据查询是否快速完成（因此不必使用getQueryResults（）），您将获得更多或更少的行

我提出了一个错误，我们应该很快得到修复

解决方法（总体来说是一个好主意）是为查询和getQueryResults调用设置maxResults。如果您想要很多行，那么您可能需要使用返回的页面标记对结果进行分页

下面是一个从完成的查询作业中读取一页数据的示例。它将包含在下一版本的bq.py中：

class _JobTableReader(_TableReader):
  """A TableReader that reads from a completed job."""

  def __init__(self, local_apiclient, project_id, job_id):
    self.job_id = job_id
    self.project_id = project_id
    self._apiclient = local_apiclient

  def ReadSchemaAndRows(self, max_rows=None):
    """Read at most max_rows rows from a table and the schema.

    Args:
      max_rows: maximum number of rows to return.

    Raises:
      BigqueryInterfaceError: when bigquery returns something unexpected.

    Returns:
      A tuple where the first item is the list of fields and the
      second item a list of rows.
    """
    page_token = None
    rows = []
    schema = {}
    max_rows = max_rows if max_rows is not None else sys.maxint
    while len(rows) < max_rows:
      (more_rows, page_token, total_rows, current_schema) = self._ReadOnePage(
          max_rows=max_rows - len(rows),
          page_token=page_token)
      if not schema and current_schema:
        schema = current_schema.get('fields', {})

      max_rows = min(max_rows, total_rows)
      for row in more_rows:
        rows.append([entry.get('v', '') for entry in row.get('f', [])])
      if not page_token and len(rows) != max_rows:
          raise BigqueryInterfaceError(
            'PageToken missing for %r' % (self,))
      if not more_rows and len(rows) != max_rows:
        raise BigqueryInterfaceError(
            'Not enough rows returned by server for %r' % (self,))
    return (schema, rows)

  def _ReadOnePage(self, max_rows, page_token=None):
    data = self._apiclient.jobs().getQueryResults(
        maxResults=max_rows,
        pageToken=page_token,
        # Sets the timeout to 0 because we assume the table is already ready.
        timeoutMs=0,
        projectId=self.project_id,
        jobId=self.job_id).execute()
    if not data['jobComplete']:
      raise BigqueryError('Job %s is not done' % (self,))
    page_token = data.get('pageToken', None)
    total_rows = int(data['totalRows'])
    schema = data.get('schema', None)
    rows = data.get('rows', [])
    return (rows, page_token, total_rows, schema)

class\u作业表阅读器（\u表阅读器）：
“”“读取已完成作业的表格读取器。”“”
定义初始（自我、本地客户、项目id、作业id）：
self.job\u id=job\u id
self.project\u id=项目\u id
self.\u apiclient=本地\u apiclient
def ReadSchemaAndRows（自身，最大行数=无）：
“”“从表和架构中最多读取max_行。”。
Args：
最大行数：要返回的最大行数。
提出：
BigqueryInterfaceError：当bigquery返回意外的内容时。
返回：
一个元组，其中第一项是字段列表和
第二项是行列表。
"""
page_token=None
行=[]
模式={}
max_rows=如果max_rows不是None-else sys.maxint，则max_rows=max_rows
而len（行）

您有使用页面标记的示例吗？我们找到了文档页：，但没有任何代码示例。这对我们的代码来说似乎很重要，因为每页的最大结果有一个硬上限。另外，这是API错误还是后端错误？我们似乎仍然拥有它，所以一旦完成，我们是否需要拉一个更新的版本？修复程序完全在BigQuery后端。它昨天就应该上线了。如果您仍然看到意外的行为，您可以提供查询的作业ID吗？看起来我们昨天仍然存在问题，尽管目前我们没有输出作业ID。我们可以做些改变，然后再联系你。同时，我们仍然感兴趣的是，是否有任何在python中使用分页的示例。