googlebigquerysql语句_Sql_Google Bigquery_Github Archive

googlebigquerysql语句

sql google-bigquery

googlebigquerysql语句,sql,google-bigquery,github-archive,Sql,Google Bigquery,Github Archive,我正在尝试使用Google Big Query从GitHub归档中获取一些数据。我请求的当前数据量太大，BigQuery无法处理（至少在空闲层），因此我试图限制请求的范围我想限制数据，以便只有当前具有超过1000颗星的存储库才会返回历史数据。这比仅仅说repository_watchers>1000更复杂，因为这将排除存储库获得的前1000颗星的历史数据 SELECT repository_name, repository_owner, created_at, type, repository

我正在尝试使用Google Big Query从GitHub归档中获取一些数据。我请求的当前数据量太大，BigQuery无法处理（至少在空闲层），因此我试图限制请求的范围

我想限制数据，以便只有当前具有超过1000颗星的存储库才会返回历史数据。这比仅仅说repository_watchers>1000更复杂，因为这将排除存储库获得的前1000颗星的历史数据

SELECT repository_name, repository_owner, created_at, type, repository_url, repository_watchers
FROM [githubarchive:github.timeline]
WHERE type="WatchEvent"
ORDER BY created_at DESC

编辑：我使用的解决方案（基于@Brian的回答）

尝试：

如果不支持该语法，您可以使用如下3步解决方案：

步骤1：查找哪些存储库\u NAME值至少有一条记录w/a存储库\u WATCHERS amount>1000

select repository_name, max(repository_watchers) as curr_watchers
  from [githubarchive :github.timeline]
 where type = 'WatchEvent'
 group by repository_name
having max(repository_watchers) > 1000

步骤2：将结果存储为一个表，称之为SUB

步骤3：对SUB（和原始表）运行以下命令

repository_watchers列会随着时间的推移而变化，并反映给定创建日期的星星数？是的，这是完全正确的。repository_watchers显示创建时观察者的数量。请注意，BigQuery中的成本反映了查询中涉及的列的大小。因此，过滤掉一半的信息仍然会通过所有信息。为了降低成本，请执行完全相同的筛选，但将结果导出到一个新表，然后在此新表上运行查询。感谢您提供的信息。我认为Google Big Query不能使用它，因为它一次只允许一个查询：Query Failed Error:Error at:3.16-8.11。一次只能执行一个查询。你知道这是怎么分开的吗？@Ankusagrawal啊，我没有注意到，甚至从来没有听说过“谷歌大查询”。我真的不认为没有子查询或内联视图（from子句中的子查询）就可以做到这一点。您知道如何将其拆分为两个查询吗？我可以将查询中的数据保存到一个表中，也可以查询第二个表。@Ankusagrawal我将以您可以运行两个查询的方式进行编辑，但请在“我的编辑”中尝试该查询，以查看该语法是否正确supported@AnkushAgrawal没问题，我很想看看有什么效果
select y.* from [githubarchive :github.timeline] y join (select repository_name, max(repository_watchers) from [githubarchive :github.timeline] where x.type = 'WatchEvent' group by repository_name having max(repository_watchers) > 1000) x on y.repository_name = x.repository_name order by y.created_at desc

select repository_name, max(repository_watchers) as curr_watchers from [githubarchive :github.timeline] where type = 'WatchEvent' group by repository_name having max(repository_watchers) > 1000

select y.* from [githubarchive :github.timeline] y join sub x on y.repository_name = x.repository_name order by y.created_at desc