Google bigquery Bigquery:“；“内存不足”；_Google Bigquery

Google bigquery Bigquery:“；“内存不足”；

google-bigquery

Google bigquery Bigquery:“；“内存不足”；,google-bigquery,Google Bigquery,Bigquery开始给我错误：今天早上运行此查询时内存不足。涉及的两个表包含的数据不超过5GB。另外，我使用的是表装饰器，1407249067530相当于今天上午10:30左右（20140805）。我不知道有什么问题作业ID:red-road-574:Job_x8flLfo4QwA1gQ_FCrNWbKY-bZM select * from ( select t_con

Bigquery开始给我错误：今天早上运行此查询时内存不足。涉及的两个表包含的数据不超过5GB。另外，我使用的是表装饰器，1407249067530相当于今天上午10:30左右（20140805）。我不知道有什么问题

作业ID:red-road-574:Job_x8flLfo4QwA1gQ_FCrNWbKY-bZM

  select * from 
                (                           
                select  t_connection.row_id AS debug_row_id,                        
                    t_connection.hardware_id AS hardware_id,                        
                    t_connection.debug_data AS debug_data,                      
                    t_connection.connection_status AS connection_status,                        
                    t_connection.date_time AS debug_date_time,                      
                    t_gps.hardware_id AS hardware_id2,                      
                    t_gps.latitude AS latitude,                     
                    t_gps.longitude AS longitude,                       
                    t_gps.date_time AS gps_date_time,                       
                     t_gps.zip_code AS zip_code,
                    ROW_NUMBER() OVER (PARTITION BY debug_row_id ORDER BY time_diff) row_num,                       
                    from(                           
                          select    *,                      
                                ABS(t_gps.date_time-t_connection.date_time) AS time_diff                    
                              from ( select CONCAT(String(gg.hardware_id),String(gg.date_time)) as row_id,                      
                                    gg.hardware_id as hardware_id,              
                                    gg.latitude as latitude,                
                                    gg.longitude as longitude,              
                                    gg.date_time as date_time,              
                                     gg.zip_code as zip_code                        
                                     from   [my data set.table1_20140805@1407249067530-] gg                         

                                   ) AS t_gps                           

                                    INNER JOIN EACH                     

                                  ( select  CONCAT(CONCAT(String(dd.debug_reason),String(dd.hardware_id)),String(dd.date_time)) as row_id,                      
                                        dd.hardware_id as hardware_id,                  
                                        dd.date_time as date_time,                      
                                        dd.debug_data as debug_data,                    
                                case                    
                                    when dd.debug_reason = 1 then 'Successful_Connection'               
                                    when dd.debug_reason = 2 then 'Dropped_Connection'              
                                    when dd.debug_reason = 3 then 'Failed_Connection'               
                                end AS connection_status                                                
                                    from    [my data set.table2_20140805@1407249067530-] dd         
                                    where   dd.debug_reason in (50013, 50017, 50018)    

                                ) as t_connection                           

                                 ON t_connection.hardware_id = t_gps.hardware_id                    
                )                           
               )  WHERE row_num=1

你遇到了一个奇怪的情况。如果对嵌套或重复的结果使用

allowLargeResults

，而不使用

flattresults=false

，查询将进入特殊模式。（当您使用时间戳时，实际上是在使用嵌套的数据结构，这是一个产生了1000个bug的设计决策，希望很快就会改变）。这种特殊的查询模式有一些限制，这正是您所遇到的

一般来说，我们希望它是无缝的，这就是为什么它没有被记录下来的原因。然而，由于您在这里遇到了一个问题，我将解释一下如何避免它

您有两个选择来解决此问题：

如果使用嵌套或重复的结果（看起来不是，这很好）：

重命名结果，名称中不带点
将查询上的结果字段设置为“false”。这意味着嵌套和重复的字段实际上将在结果中嵌套和重复

如果在结果中使用时间戳：

将时间戳转换为字符串或数值。对不起

如果您真的不需要大的结果：

取消设置allowLargeResults标志

我意识到所有这些选择都非常不令人满意。这是我们正在积极努力改进的一个领域。

现在，AllowArgerUlts=true，FlattResults=false，并在第一步将时间戳转换为数值

  select * from 
                (                           
                select  row_id AS debug_row_id,                     
                    hardware_id AS hardware_id,                     
                    debug_data AS debug_data,                       
                    connection_status AS connection_status,                     
                    date_time AS debug_date_time,                       
                    hardware_id2 AS hardware_id2,                       
                    latitude AS latitude,                       
                    longitude AS longitude,                     
                    date_time2 AS gps_date_time,                        
                    zip_code AS zip_code,
                    ROW_NUMBER() OVER (PARTITION BY debug_row_id ORDER BY time_diff) row_num,                       
                    from(                           
                          select    *,                      
                                ABS(t_gps.date_time2-t_connection.date_time) AS time_diff                   
                              from ( select CONCAT(String(gg.hardware_id),String(gg.date_time)) as row_id_gps,                      
                                    gg.hardware_id as hardware_id2,                 
                                    gg.latitude as latitude,                
                                    gg.longitude as longitude,              
                                    TIMESTAMP_TO_MSEC(gg.date_time) as date_time2,              
                                     gg.zip_code as zip_code                        
                                     from   [test.gps32_20140805@1407249067530-] gg                         

                                   ) AS t_gps                           

                                    INNER JOIN EACH                     

                                  ( select  CONCAT(CONCAT(String(dd.debug_reason),String(dd.hardware_id)),String(dd.date_time)) as row_id,                      
                                        dd.hardware_id as hardware_id,                  
                                        TIMESTAMP_TO_MSEC(dd.date_time) as date_time,                       
                                        dd.debug_data as debug_data,                    
                                case                    
                                    when dd.debug_reason = 1 then 'Successful_Connection'               
                                    when dd.debug_reason = 2 then 'Dropped_Connection'              
                                    when dd.debug_reason = 3 then 'Failed_Connection'               
                                end AS connection_status                                                
                                    from    [test.debug_data_developer_20140805@1407249067530-] dd      
                                    where   dd.debug_reason in (50013, 50017, 50018)

                                ) as t_connection                           

                                 ON t_connection.hardware_id = t_gps.hardware_id2                   
                )                           
               )  WHERE row_num=1

它给了我

Query Failed
Error: Resources exceeded during query execution.
Job ID: red-road-574:job_ikWQvffmPEUP6DtTvJaYpXHFJ2M

这是allowLargeResults=true、FlattResults=true的运行SQL。我不知道我做了些什么，也许只是添加了一个HAVING子句？但是在连接中，我将一侧更改为一个完整的表，而不是上面带有decorator的一侧，因此所涉及的数据实际上增加了。我不确定它是否能保持成功，或者只是暂时的运气。

谢谢您的回答，但这是在Google Bigquery上运行的，为什么索引很重要？事实上，我没有记忆……好吧，你得简化它。谁拥有记忆并不重要。在我看来像SQL；如果你有连接，索引总是相关的。你有失败的作业id吗？red-road-574:job_x8flLfo4QwA1gQ_FCrNWbKY-bZM是我最后一次尝试：）拆分成更小的查询并比较速度我将调整我的查询，看看会发生什么。在等你的时候，我试过这个：1。执行子查询（基本上与sql相同，但没有最后一步的选择）并将结果存储到一个表中，比如table1和table2。选择*table1，其中row_num=1它每次都失败，一个作业id是job_JXUiRlXP-jOxolrCQ-wUA9PK-Uw，但是，当我尝试选择*table1，其中row_num=10（或其他字段上的where子句）时，它是否包括嵌套子查询中的中间结果？我在嵌套查询和最终结果中都有时间戳数据（date\u time），它是否也适合案例1？