Python BadRequest:400列名类型在[16:3]处不明确

Python BadRequest:400列名类型在[16:3]处不明确,python,google-bigquery,Python,Google Bigquery,我是google big query的新手,我想访问Github API,我有以下代码 query_job = client.query(""" SELECT actor.login AS actor_login, COUNT(1) AS events_actor_count FROM `githubarchive:year.2017` as gb17, `githubarchive:year.2016` as gb16, `githubarchive:year.2015` as

我是google big query的新手,我想访问Github API,我有以下代码

query_job = client.query("""

SELECT
  actor.login AS actor_login,
  COUNT(1) AS events_actor_count
FROM
`githubarchive:year.2017` as gb17, 
`githubarchive:year.2016` as gb16, 
`githubarchive:year.2015` as gb15, 
`githubarchive:year.2014` as gb14, 
`githubarchive:year.2013` as gb13,
`githubarchive:year.2012` as gb12,
`githubarchive:year.2011` as gb11 

WHERE
  type = 'CommitCommentEvent'
    OR type = 'PushEvent'
    OR type = 'IssueCommentEvent'
    OR type = 'PullRequestEvent'
    OR type = 'PullRequestReviewCommentEvent'
    OR type = 'IssuesEvent'
GROUP BY
  actor_login
ORDER BY
  events_actor_count DESC

  """)

results = query_job.result()
我得到了这个错误:

---------------------------------------------------------------------------
BadRequest                                Traceback (most recent call last)
<ipython-input-29-9c0a41bed3c6> in <module>()
     27   """)
     28 
---> 29 results = query_job.result()

/anaconda3/lib/python3.6/site-packages/google/cloud/bigquery/job.py in result(self, timeout, retry)
   2735             not complete in the given timeout.
   2736         """
-> 2737         super(QueryJob, self).result(timeout=timeout)
   2738         # Return an iterator instead of returning the job.
   2739         if not self._query_results:

/anaconda3/lib/python3.6/site-packages/google/cloud/bigquery/job.py in result(self, timeout)
    697             self._begin()
    698         # TODO: modify PollingFuture so it can pass a retry argument to done().
--> 699         return super(_AsyncJob, self).result(timeout=timeout)
    700 
    701     def cancelled(self):

/anaconda3/lib/python3.6/site-packages/google/api_core/future/polling.py in result(self, timeout)
    123             # pylint: disable=raising-bad-type
    124             # Pylint doesn't recognize that this is valid in this case.
--> 125             raise self._exception
    126 
    127         return self._result

BadRequest: 400 Column name type is ambiguous at [16:3]

我想我的错误在SELECT语句中,我必须附加我的表名?但是当我有多张桌子的时候,我该怎么做呢?但我的怀疑也可能是错误的,所以我希望得到任何建议。谢谢。

看起来您在BigQuery标准中使用交叉连接逗号,SQL使用交叉连接而不是UNION ALL,因此您对列类型的引用不明确


因此,请尝试在select语句中使用explicit UNION ALL

尝试使用通配符从所有所需年份中进行选择:

SELECT
  actor.login AS actor_login,
  COUNT(1) AS events_actor_count
FROM `githubarchive:year.20*` as gh
WHERE
   _TABLE_SUFFIX BETWEEN '11' AND '18' AND
   type IN (
     'CommitCommentEvent',
     'PushEvent',
     'IssueCommentEvent',
     'PullRequestEvent',
     'PullRequestReviewCommentEvent',
     'IssuesEvent'
   )
GROUP BY
  actor_login
ORDER BY
  events_actor_count DESC

我还使用了IN-list来简化过滤器。

您可以使用通配符,也可以使用_TABLE_SUFFIX属性来进一步减少查询中扫描的字节数,因为通配符策略将扫描所有内容。它还允许您在特定年份进行过滤

是这样的:

select 
  actor.login AS actor_login,
  COUNT(1) AS events_actor_count 
from `githubarchive.year.*` 
WHERE
  type = 'CommitCommentEvent'
    OR type = 'PushEvent'
    OR type = 'IssueCommentEvent'
    OR type = 'PullRequestEvent'
    OR type = 'PullRequestReviewCommentEvent'
    OR type = 'IssuesEvent'
AND (_TABLE_SUFFIX in ('2011', '2012', '2013', '2014', '2015', '2016', '2017'))
group by actor.login
order by events_actor_count