Google bigquery Google BigQuery加入每个错误

Google bigquery Google BigQuery加入每个错误,google-bigquery,Google Bigquery,我试图运行一个简单的查询,它连接了太大的数据集,但我遇到了各种错误。这里复制了一个使用公共数据库的类似查询 SELECT gn1.actor_attributes.blog, gn1.actor_attributes.company, gn1.actor_attributes.email, gn1.actor_attributes.gravatar_id, gn1.actor_attributes.location, gn1.actor_attributes.login, gn1.actor_a

我试图运行一个简单的查询,它连接了太大的数据集,但我遇到了各种错误。这里复制了一个使用公共数据库的类似查询

SELECT gn1.actor_attributes.blog, gn1.actor_attributes.company, gn1.actor_attributes.email, gn1.actor_attributes.gravatar_id, gn1.actor_attributes.location, gn1.actor_attributes.login, gn1.actor_attributes.name,gn2.actor_attributes.blog, gn2.actor_attributes.company, gn2.actor_attributes.email, gn2.actor_attributes.gravatar_id, gn2.actor_attributes.location, gn2.actor_attributes.login, gn2.actor_attributes.name
FROM [publicdata:samples.github_nested] as gn1 inner join (select actor_attributes.blog, actor_attributes.company,actor_attributes.email, actor_attributes.gravatar_id, actor_attributes.location, actor_attributes.login, actor_attributes.name from [publicdata:samples.github_nested] group by actor_attributes.blog, actor_attributes.company,actor_attributes.email, actor_attributes.gravatar_id, actor_attributes.location, actor_attributes.login, actor_attributes.name) as gn2 on gn1.payload.target.login=gn2.actor_attributes.login
WHERE gn1.type='FollowEvent'
如果没有“内部联接,则表示数据库大小太大。当我使用“内部联接每个”运行查询时,大查询会给出一个错误,说明:

无法执行分区联接,因为gn2不可并行:(从[publicdata:samples.github\u嵌套]组中选择[actor\u attributes.blog]、[actor\u attributes.company]、[actor\u attributes.email]、[actor\u attributes.gravatar\u id]、[actor\u attributes.location]、[actor\u attributes.login]、[actor\u attributes.name],[actor\u attributes.company],[actor\u attributes.email],[actor\u attributes.gravatar\u id],[actor\u attributes.location],[actor\u attributes.login],[actor\u attributes.name])

任何帮助都将不胜感激


感谢您提供了一个[非]工作的公共示例。它使调试更加容易

重新格式化原始查询:

SELECT gn1.actor_attributes.blog, gn1.actor_attributes.company, gn1.actor_attributes.email, gn1.actor_attributes.gravatar_id, gn1.actor_attributes.location, gn1.actor_attributes.login, gn1.actor_attributes.name,gn2.actor_attributes.blog, gn2.actor_attributes.company, gn2.actor_attributes.email, gn2.actor_attributes.gravatar_id, gn2.actor_attributes.location, gn2.actor_attributes.login, gn2.actor_attributes.name
FROM [publicdata:samples.github_nested] AS gn1 
INNER JOIN EACH (
  SELECT actor_attributes.blog, actor_attributes.company,actor_attributes.email, actor_attributes.gravatar_id, actor_attributes.location, actor_attributes.login, actor_attributes.name
  FROM [publicdata:samples.github_nested]
  GROUP BY actor_attributes.blog, actor_attributes.company,actor_attributes.email, actor_attributes.gravatar_id, actor_attributes.location, actor_attributes.login, actor_attributes.name) 
AS gn2 ON gn1.payload.target.login=gn2.actor_attributes.login
WHERE gn1.type='FollowEvent'
该查询确实失败了,出现了上述错误消息。虽然错误消息可能更好,但解决方案很简单:只需将每个子查询添加到GROUP BY中,以使其可并行化:

SELECT gn1.actor_attributes.blog, gn1.actor_attributes.company, gn1.actor_attributes.email, gn1.actor_attributes.gravatar_id, gn1.actor_attributes.location, gn1.actor_attributes.login, gn1.actor_attributes.name,gn2.actor_attributes.blog, gn2.actor_attributes.company, gn2.actor_attributes.email, gn2.actor_attributes.gravatar_id, gn2.actor_attributes.location, gn2.actor_attributes.login, gn2.actor_attributes.name
FROM [publicdata:samples.github_nested] AS gn1 
INNER JOIN EACH (
  SELECT actor_attributes.blog, actor_attributes.company,actor_attributes.email, actor_attributes.gravatar_id, actor_attributes.location, actor_attributes.login, actor_attributes.name
  FROM [publicdata:samples.github_nested]
  GROUP EACH BY actor_attributes.blog, actor_attributes.company,actor_attributes.email, actor_attributes.gravatar_id, actor_attributes.location, actor_attributes.login, actor_attributes.name) 
AS gn2 ON gn1.payload.target.login=gn2.actor_attributes.login
WHERE gn1.type='FollowEvent'

[查询完成(12.7秒,237 MB已处理)]

感谢您提供了一个[非]工作的公共示例。这使调试更加容易

重新格式化原始查询:

SELECT gn1.actor_attributes.blog, gn1.actor_attributes.company, gn1.actor_attributes.email, gn1.actor_attributes.gravatar_id, gn1.actor_attributes.location, gn1.actor_attributes.login, gn1.actor_attributes.name,gn2.actor_attributes.blog, gn2.actor_attributes.company, gn2.actor_attributes.email, gn2.actor_attributes.gravatar_id, gn2.actor_attributes.location, gn2.actor_attributes.login, gn2.actor_attributes.name
FROM [publicdata:samples.github_nested] AS gn1 
INNER JOIN EACH (
  SELECT actor_attributes.blog, actor_attributes.company,actor_attributes.email, actor_attributes.gravatar_id, actor_attributes.location, actor_attributes.login, actor_attributes.name
  FROM [publicdata:samples.github_nested]
  GROUP BY actor_attributes.blog, actor_attributes.company,actor_attributes.email, actor_attributes.gravatar_id, actor_attributes.location, actor_attributes.login, actor_attributes.name) 
AS gn2 ON gn1.payload.target.login=gn2.actor_attributes.login
WHERE gn1.type='FollowEvent'
该查询确实失败了,出现了上述错误消息。虽然错误消息可能更好,但解决方案很简单:只需将每个子查询添加到GROUP BY中,以使其可并行化:

SELECT gn1.actor_attributes.blog, gn1.actor_attributes.company, gn1.actor_attributes.email, gn1.actor_attributes.gravatar_id, gn1.actor_attributes.location, gn1.actor_attributes.login, gn1.actor_attributes.name,gn2.actor_attributes.blog, gn2.actor_attributes.company, gn2.actor_attributes.email, gn2.actor_attributes.gravatar_id, gn2.actor_attributes.location, gn2.actor_attributes.login, gn2.actor_attributes.name
FROM [publicdata:samples.github_nested] AS gn1 
INNER JOIN EACH (
  SELECT actor_attributes.blog, actor_attributes.company,actor_attributes.email, actor_attributes.gravatar_id, actor_attributes.location, actor_attributes.login, actor_attributes.name
  FROM [publicdata:samples.github_nested]
  GROUP EACH BY actor_attributes.blog, actor_attributes.company,actor_attributes.email, actor_attributes.gravatar_id, actor_attributes.location, actor_attributes.login, actor_attributes.name) 
AS gn2 ON gn1.payload.target.login=gn2.actor_attributes.login
WHERE gn1.type='FollowEvent'
[查询完成(运行12.7秒,处理237 MB)]