Python BadRequest:400列名类型在[16:3]处不明确
我是google big query的新手,我想访问Github API,我有以下代码Python BadRequest:400列名类型在[16:3]处不明确,python,google-bigquery,Python,Google Bigquery,我是google big query的新手,我想访问Github API,我有以下代码 query_job = client.query(""" SELECT actor.login AS actor_login, COUNT(1) AS events_actor_count FROM `githubarchive:year.2017` as gb17, `githubarchive:year.2016` as gb16, `githubarchive:year.2015` as
query_job = client.query("""
SELECT
actor.login AS actor_login,
COUNT(1) AS events_actor_count
FROM
`githubarchive:year.2017` as gb17,
`githubarchive:year.2016` as gb16,
`githubarchive:year.2015` as gb15,
`githubarchive:year.2014` as gb14,
`githubarchive:year.2013` as gb13,
`githubarchive:year.2012` as gb12,
`githubarchive:year.2011` as gb11
WHERE
type = 'CommitCommentEvent'
OR type = 'PushEvent'
OR type = 'IssueCommentEvent'
OR type = 'PullRequestEvent'
OR type = 'PullRequestReviewCommentEvent'
OR type = 'IssuesEvent'
GROUP BY
actor_login
ORDER BY
events_actor_count DESC
""")
results = query_job.result()
我得到了这个错误:
---------------------------------------------------------------------------
BadRequest Traceback (most recent call last)
<ipython-input-29-9c0a41bed3c6> in <module>()
27 """)
28
---> 29 results = query_job.result()
/anaconda3/lib/python3.6/site-packages/google/cloud/bigquery/job.py in result(self, timeout, retry)
2735 not complete in the given timeout.
2736 """
-> 2737 super(QueryJob, self).result(timeout=timeout)
2738 # Return an iterator instead of returning the job.
2739 if not self._query_results:
/anaconda3/lib/python3.6/site-packages/google/cloud/bigquery/job.py in result(self, timeout)
697 self._begin()
698 # TODO: modify PollingFuture so it can pass a retry argument to done().
--> 699 return super(_AsyncJob, self).result(timeout=timeout)
700
701 def cancelled(self):
/anaconda3/lib/python3.6/site-packages/google/api_core/future/polling.py in result(self, timeout)
123 # pylint: disable=raising-bad-type
124 # Pylint doesn't recognize that this is valid in this case.
--> 125 raise self._exception
126
127 return self._result
BadRequest: 400 Column name type is ambiguous at [16:3]
我想我的错误在SELECT语句中,我必须附加我的表名?但是当我有多张桌子的时候,我该怎么做呢?但我的怀疑也可能是错误的,所以我希望得到任何建议。谢谢。看起来您在BigQuery标准中使用交叉连接逗号,SQL使用交叉连接而不是UNION ALL,因此您对列类型的引用不明确
因此,请尝试在select语句中使用explicit UNION ALL尝试使用通配符从所有所需年份中进行选择:
SELECT
actor.login AS actor_login,
COUNT(1) AS events_actor_count
FROM `githubarchive:year.20*` as gh
WHERE
_TABLE_SUFFIX BETWEEN '11' AND '18' AND
type IN (
'CommitCommentEvent',
'PushEvent',
'IssueCommentEvent',
'PullRequestEvent',
'PullRequestReviewCommentEvent',
'IssuesEvent'
)
GROUP BY
actor_login
ORDER BY
events_actor_count DESC
我还使用了IN-list来简化过滤器。您可以使用通配符,也可以使用_TABLE_SUFFIX属性来进一步减少查询中扫描的字节数,因为通配符策略将扫描所有内容。它还允许您在特定年份进行过滤 是这样的:
select
actor.login AS actor_login,
COUNT(1) AS events_actor_count
from `githubarchive.year.*`
WHERE
type = 'CommitCommentEvent'
OR type = 'PushEvent'
OR type = 'IssueCommentEvent'
OR type = 'PullRequestEvent'
OR type = 'PullRequestReviewCommentEvent'
OR type = 'IssuesEvent'
AND (_TABLE_SUFFIX in ('2011', '2012', '2013', '2014', '2015', '2016', '2017'))
group by actor.login
order by events_actor_count