Google bigquery 基于行值填充列BigQuery标准SQL

Google bigquery 基于行值填充列BigQuery标准SQL,google-bigquery,Google Bigquery,我有一张桌子,让我们说:- Name A B C D ------- --- --- --- --- alpha 0 1 0 0.6 beta 0.6 0 0 0.1 gama 0 0 0 0.6 现在我想基于A、B、C、D值填充两列(结果和类)上的值 条件是,如果任何字段(A、B、C、D)中的值大于0.5,则Result列应具有“F”,否则应具有“p”。此外,其有效值>0.5

我有一张桌子,让我们说:-

  Name   A    B    C    D    
------- ---  ---  ---  --- 
 alpha   0    1    0   0.6     
 beta   0.6   0    0   0.1
 gama    0    0    0   0.6
现在我想基于A、B、C、D值填充两列
(结果和类)
上的值

条件是,如果任何字段(A、B、C、D)中的值大于0.5,则
Result
列应具有“F”,否则应具有“p”。此外,其有效值>0.5的列应位于
示例(“A,D”)中

为了更好地理解,以下是我想要的结果:-

  Name   A    B    C    D    Result    Class
------- ---  ---  ---  ---  --------  -------
 alpha   0    1    0   0.6     F        B,D      
 beta   0.6   0    0   0.1     F         A
 gama    0    0    0   0.4     P        NULL 
我刚接触BigQuery,需要帮助。解决办法是什么

这就是我迄今为止所做的

  SELECT *, CASE WHEN (A > .5 OR B > .5 OR C > .5 OR D >.5)
            THEN 'F'
            ELSE 'P' END AS Result AND Class....//here i am stuck
  
  FROM table1


实际上,我不知道如何构建这个精确的脚本。我能够实现第一部分,即我能够用“F”和“P”填充结果列,但无法生成类来填充列名….

由于您正在分析每一列,我假设您没有大量的列。因此,我创建了一个简单的函数来检查行的值,并在满足条件时返回列的名称

我已经使用提供的示例数据来测试下面的查询

#javaScript UDF
CREATE TEMP FUNCTION class(A FLOAT64, B FLOAT64, C FLOAT64, D FLOAT64)
RETURNS String
LANGUAGE js AS """
var class_array=[];
if(A > 0.5){class_array.push("A");}
if(B > 0.5){class_array.push("B");}
if(C > 0.5){class_array.push("C");}
if(D > 0.5){class_array.push("D");} 

return class_array;
""";

#sample data
WITH data as (
 SELECT "alpha" as Name, 0 as A, 1 as B, 0 as C, 0.6 as D UNION ALL  
 SELECT "beta", 0.6, 0, 0, 0.1 UNION ALL
 SELECT "gama", 0, 0, 0, 0.4
)

Select name, A,B,C,D, 
        CASE WHEN (A > .5 OR B > .5 OR C > .5 OR D >.5) THEN "F" ELSE "P" END AS Result,
        IF(class(A,B,C,D) is null , null, class(A,B,C,D)) as Class from data
以及产量,

Row name    A   B   C   D   Result  Class
1   alpha   0   1   0   0.6 F       B,D
2   beta    0.6 0   0   0.1 F       A
3   gama    0   0   0   0.4 P   
如UDF中所示,将分析每一行的值,如果满足条件,则会手动将列名添加到字符串数组中。此外,请注意JSUDF返回的是字符串,而不是数组。它会自动将先前创建的数组转换为字符串


最后,我应该指出,在这种上下文中,在查询中检索列名是不可能的。尽管在其他情况下,您可以使用。

检索它,因为您正在分析每一列,我假设您没有大量的列。因此,我创建了一个简单的函数来检查行的值,并在满足条件时返回列的名称

我已经使用提供的示例数据来测试下面的查询

#javaScript UDF
CREATE TEMP FUNCTION class(A FLOAT64, B FLOAT64, C FLOAT64, D FLOAT64)
RETURNS String
LANGUAGE js AS """
var class_array=[];
if(A > 0.5){class_array.push("A");}
if(B > 0.5){class_array.push("B");}
if(C > 0.5){class_array.push("C");}
if(D > 0.5){class_array.push("D");} 

return class_array;
""";

#sample data
WITH data as (
 SELECT "alpha" as Name, 0 as A, 1 as B, 0 as C, 0.6 as D UNION ALL  
 SELECT "beta", 0.6, 0, 0, 0.1 UNION ALL
 SELECT "gama", 0, 0, 0, 0.4
)

Select name, A,B,C,D, 
        CASE WHEN (A > .5 OR B > .5 OR C > .5 OR D >.5) THEN "F" ELSE "P" END AS Result,
        IF(class(A,B,C,D) is null , null, class(A,B,C,D)) as Class from data
以及产量,

Row name    A   B   C   D   Result  Class
1   alpha   0   1   0   0.6 F       B,D
2   beta    0.6 0   0   0.1 F       A
3   gama    0   0   0   0.4 P   
如UDF中所示,将分析每一行的值,如果满足条件,则会手动将列名添加到字符串数组中。此外,请注意JSUDF返回的是字符串,而不是数组。它会自动将先前创建的数组转换为字符串


最后,我应该指出,在这种上下文中,在查询中检索列名是不可能的。不过,在其他情况下,您可以使用下面的。

来检索它,这是用于BigQuery标准SQL的

使用javaScript UDF在许多情况下都会有所帮助,但如果问题可以用SQL解决,则应该避免,如下面的示例所示

#standardSQL
SELECT *,
  ( SELECT IF(LOGICAL_OR(val > 0.5), 'F', 'P') 
    FROM UNNEST([A,B,C,D]) val
  ) AS Result,
  ( SELECT STRING_AGG(['A','B','C','D'][OFFSET(pos)]) 
    FROM UNNEST([A,B,C,D]) val WITH OFFSET pos 
    WHERE val > 0.5
  ) AS Class
FROM `project.dataset.table`  
#standardSQL
WITH `project.dataset.table` AS (
  SELECT 'alpha' name, 0 A, 1 B, 0 C, 0.6 D UNION ALL
  SELECT 'beta', 0.6, 0, 0, 0.1 UNION ALL
  SELECT 'gamma', 0, 0, 0, 0.4 
)
SELECT *,
  ( SELECT IF(LOGICAL_OR(val > 0.5), 'F', 'P') 
    FROM UNNEST([A,B,C,D]) val
  ) AS Result,
  ( SELECT STRING_AGG(['A','B','C','D'][OFFSET(pos)]) 
    FROM UNNEST([A,B,C,D]) val WITH OFFSET pos 
    WHERE val > 0.5
  ) AS Class
FROM `project.dataset.table`    
您可以使用我们问题中的样本数据测试、玩上面的游戏,如下面的示例所示

#standardSQL
SELECT *,
  ( SELECT IF(LOGICAL_OR(val > 0.5), 'F', 'P') 
    FROM UNNEST([A,B,C,D]) val
  ) AS Result,
  ( SELECT STRING_AGG(['A','B','C','D'][OFFSET(pos)]) 
    FROM UNNEST([A,B,C,D]) val WITH OFFSET pos 
    WHERE val > 0.5
  ) AS Class
FROM `project.dataset.table`  
#standardSQL
WITH `project.dataset.table` AS (
  SELECT 'alpha' name, 0 A, 1 B, 0 C, 0.6 D UNION ALL
  SELECT 'beta', 0.6, 0, 0, 0.1 UNION ALL
  SELECT 'gamma', 0, 0, 0, 0.4 
)
SELECT *,
  ( SELECT IF(LOGICAL_OR(val > 0.5), 'F', 'P') 
    FROM UNNEST([A,B,C,D]) val
  ) AS Result,
  ( SELECT STRING_AGG(['A','B','C','D'][OFFSET(pos)]) 
    FROM UNNEST([A,B,C,D]) val WITH OFFSET pos 
    WHERE val > 0.5
  ) AS Class
FROM `project.dataset.table`    
输出为

Row name    A       B   C   D       Result  Class    
1   alpha   0.0     1   0   0.6     F       B,D  
2   beta    0.6     0   0   0.1     F       A    
3   gamma   0.0     0   0   0.4     P       null       

下面是BigQuery标准SQL

使用javaScript UDF在许多情况下都会有所帮助,但如果问题可以用SQL解决,则应该避免,如下面的示例所示

#standardSQL
SELECT *,
  ( SELECT IF(LOGICAL_OR(val > 0.5), 'F', 'P') 
    FROM UNNEST([A,B,C,D]) val
  ) AS Result,
  ( SELECT STRING_AGG(['A','B','C','D'][OFFSET(pos)]) 
    FROM UNNEST([A,B,C,D]) val WITH OFFSET pos 
    WHERE val > 0.5
  ) AS Class
FROM `project.dataset.table`  
#standardSQL
WITH `project.dataset.table` AS (
  SELECT 'alpha' name, 0 A, 1 B, 0 C, 0.6 D UNION ALL
  SELECT 'beta', 0.6, 0, 0, 0.1 UNION ALL
  SELECT 'gamma', 0, 0, 0, 0.4 
)
SELECT *,
  ( SELECT IF(LOGICAL_OR(val > 0.5), 'F', 'P') 
    FROM UNNEST([A,B,C,D]) val
  ) AS Result,
  ( SELECT STRING_AGG(['A','B','C','D'][OFFSET(pos)]) 
    FROM UNNEST([A,B,C,D]) val WITH OFFSET pos 
    WHERE val > 0.5
  ) AS Class
FROM `project.dataset.table`    
您可以使用我们问题中的样本数据测试、玩上面的游戏,如下面的示例所示

#standardSQL
SELECT *,
  ( SELECT IF(LOGICAL_OR(val > 0.5), 'F', 'P') 
    FROM UNNEST([A,B,C,D]) val
  ) AS Result,
  ( SELECT STRING_AGG(['A','B','C','D'][OFFSET(pos)]) 
    FROM UNNEST([A,B,C,D]) val WITH OFFSET pos 
    WHERE val > 0.5
  ) AS Class
FROM `project.dataset.table`  
#standardSQL
WITH `project.dataset.table` AS (
  SELECT 'alpha' name, 0 A, 1 B, 0 C, 0.6 D UNION ALL
  SELECT 'beta', 0.6, 0, 0, 0.1 UNION ALL
  SELECT 'gamma', 0, 0, 0, 0.4 
)
SELECT *,
  ( SELECT IF(LOGICAL_OR(val > 0.5), 'F', 'P') 
    FROM UNNEST([A,B,C,D]) val
  ) AS Result,
  ( SELECT STRING_AGG(['A','B','C','D'][OFFSET(pos)]) 
    FROM UNNEST([A,B,C,D]) val WITH OFFSET pos 
    WHERE val > 0.5
  ) AS Class
FROM `project.dataset.table`    
输出为

Row name    A       B   C   D       Result  Class    
1   alpha   0.0     1   0   0.6     F       B,D  
2   beta    0.6     0   0   0.1     F       A    
3   gamma   0.0     0   0   0.4     P       null       

希望我的问题很清楚。希望我的问题很清楚。我最多有5500行希望不会超时。我最多有5500行希望不会超时。