如何将用户定义的函数应用于BigQuery SQL中的多个列?

如何将用户定义的函数应用于BigQuery SQL中的多个列?,sql,google-bigquery,Sql,Google Bigquery,在我正在处理的数据库中,有几个工资变量被记录为字符串,带有0000001155,00等条目。我正在使用CAST和REPLACE的组合将这些变量转换为float。对于一个变量,我使用了: CAST (REPLACE (wage_var, ",", ".") AS float64) as wage_formatted 我希望对具有相同问题的所有变量执行此过程,而不重复同一行代码。我的想法是使用函数,然后在列中迭代函数 我想知道如何创建一个函数,以便在阅读之

在我正在处理的数据库中,有几个工资变量被记录为字符串,带有
0000001155,00
等条目。我正在使用
CAST
REPLACE
的组合将这些变量转换为float。对于一个变量,我使用了:

 CAST (REPLACE (wage_var, ",", ".") AS float64) as wage_formatted
我希望对具有相同问题的所有变量执行此过程,而不重复同一行代码。我的想法是使用函数,然后在列中迭代函数

我想知道如何创建一个函数,以便在阅读之后执行标准化。然后我编写了以下函数:

将临时函数wage2float(x字符串)创建为(CAST(将(x,“,”,“)替换为float64));
挑选
工资,
工资格式的wage2float(工资变量)
从…起
`mydataset.mytable`
但是,我不清楚如何在多个列上迭代此函数。是否有一种方法可以在各列之间循环并为每列应用
wage2float
功能

编辑:

以下是输入示例(csv):

期望输出:

vl_remun_media_nom,vl_remun_media_sm,vl_remun_dezembro_nom,vl_remun_dezembro_sm,vl_ultima_remuneracao_ano,vl_salario_contratual,vl_rem_janeiro_cc,vl_rem_fevereiro_cc,vl_rem_marco_cc,vl_rem_abril_cc,vl_rem_maio_cc,vl_rem_junho_cc,vl_rem_julho_cc,vl_rem_agosto_cc,vl_rem_setembro_cc,vl_rem_outubro_cc,vl_rem_novembro_cc
6025.55,6.42,5921.09,6.31,5921.09,5148.77,5866.27,5866.27,5866.27,5866.27,5866.27,5866.27,7169.88,6254.78,5921.09,5921.09,5921.09
1447.68,1.54,1726.67,1.84,1726.67,14.0,1645.55,14.0,14.0,14.0,14.0,14.0,14.0,14.0,14.0,14.0,14.0
1304.35,1.39,1304.35,1.39,1304.35,1304.35,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1304.35,1304.35,1304.35,1304.35
1447.68,1.54,1726.67,1.84,1726.67,14.0,1645.55,14.0,14.0,14.0,14.0,14.0,14.0,14.0,14.0,14.0,14.0
1447.68,1.54,1726.67,1.84,1726.67,14.0,1645.56,14.0,14.0,14.0,14.0,14.0,14.0,14.0,14.0,14.0,14.0
1447.68,1.54,1726.67,1.84,1726.67,14.0,1645.55,14.0,14.0,14.0,14.0,14.0,14.0,14.0,14.0,14.0,14.0
1427.95,1.52,1420.68,1.51,1420.68,1420.68,1379.3,1379.3,1379.3,1379.3,1379.3,1379.3,1839.07,1379.3,1379.3,1420.68,1420.68
5937.88,6.33,5900.0,6.29,5900.0,59.0,57.38,57.38,57.38,57.38,7650.67,57.38,57.38,57.38,57.38,59.0,59.0
1087.04,1.15,1076.2,1.14,1076.2,1076.2,10.0,10.0,10.0,1076.2,1076.2,1076.2,1076.2,1434.93,1076.2,1076.2,1076.2
2395.3,2.55,2448.79,2.61,2448.79,2448.79,2377.47,2377.47,2377.47,2377.47,2377.47,2377.47,2377.47,2377.47,2377.47,2448.79,2448.79
1870.56,1.99,1820.0,1.94,1820.0,18.0,1820.01,1820.01,1820.01,1820.01,1820.01,18.2,18.2,18.2,18.2,2426.67,18.2
2960.08,3.15,3068.59,3.27,3068.59,27.0,2724.53,2500.09,3454.64,2700.88,2943.15,2943.42,2943.69,3098.28,3098.24,2976.73,3068.79
3798.04,4.04,3852.69,4.11,3852.69,30.0,2500.45,2500.57,2500.79,5306.55,5079.02,3430.02,4239.21,4182.29,4913.02,3247.38,3824.52
4945.06,5.27,5286.81,5.64,5286.81,45.0,4000.1,4000.16,5392.43,4919.14,4500.98,4500.21,5936.1,6133.08,4795.43,4576.91,5299.44
5810.0,6.19,5540.0,5.91,5540.0,55.4,6933.33,55.4,55.4,55.4,55.4,55.4,55.4,7386.67,55.4,55.4,55.4
1103.62,1.17,1090.0,1.16,1090.0,10.9,10.31,10.31,10.31,1086.2,1086.2,1086.2,1086.2,1086.2,1086.2,1453.33,10.9
2600.34,2.77,2866.13,3.05,2866.13,10.91,0.0,0.0,0.0,0.0,2168.92,1999.7,2175.13,3036.83,2909.14,2887.45,2759.44
5174.66,5.51,4967.86,5.3,4967.86,16.15,5154.31,4621.59,5161.25,5080.73,5185.34,4981.24,6430.29,5584.57,5064.43,5029.16,4835.26
5693.03,6.07,5650.78,6.03,5650.78,5650.78,5433.44,5433.44,5433.44,5433.44,7244.59,5433.44,5433.44,5868.12,5650.78,5650.78,5650.78
2485.76,2.64,2810.52,2.99,2810.52,10.91,2193.56,1925.13,2352.46,2135.21,2440.66,2232.19,2951.81,2947.97,2588.45,2516.61,2734.59
3808.35,4.06,3893.4,4.15,3893.4,3893.4,37.8,37.8,37.8,37.8,37.8,37.8,37.8,37.8,37.8,37.8,4006.8
4648.0,4.95,4549.71,4.85,4549.71,4549.71,4212.7,4549.71,6066.28,4549.71,4549.71,4549.71,4549.71,4549.71,4549.71,4549.71,4549.71
4521.62,4.82,4549.71,4.85,4549.71,4549.71,4212.7,4549.71,4549.71,4549.71,4549.71,4549.71,4549.71,4549.71,4549.71,4549.71,4549.71
3024.0,3.22,3024.0,3.22,3024.0,30.24,28.0,28.0,28.0,28.0,39.2,30.24,30.24,30.24,30.24,30.24,30.24
2946.43,3.14,2910.0,3.1,2910.0,1923.68,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2983.7,2945.59

如果需要
选择
查询,您只需使用:

SELECT CAST(REPLACE(wage_var, ',', '.') AS float64) as wage_formatted,
       CAST(REPLACE(taxes_var, ',', '.') AS float64) as taxes_formatted,
       . . . 
FROM t;
如果你想做这个“永久”。好吧,我想提出一个观点:

CREATE VIEW v_t AS
    SELECT t.*,
           CAST(REPLACE(wage_var, ',', '.') AS float64) as wage_formatted,
           CAST(REPLACE(taxes_var, ',', '.') AS float64) as taxes_formatted,
           . . . 
    FROM t;
您还可以向表中添加新列,并为它们提供浮点值

只有以vl开头的列。还有几个其他变量不需要此过程

下面是BigQuery标准SQL,并使用BQ脚本

execute immediate (select 'select * replace(' || 
  string_agg('cast(replace(' || column || ', ",", ".") as float64) as ' || column, ', ') || 
  ') from YourTable'
from (
  select regexp_extract_all(to_json_string(t), r'"(vl_[^"]*)":') as columns
  from YourTable t
  limit 1
), unnest(columns) column);    
如果应用于以下简化示例(它仍然完全代表OP的用例):

输出为

您应该单击查看最后一行的结果以查看最终结果

取决于您希望对结果做什么-您可以调整代码以使用此输出替换您的表或创建新的表,等等。请参阅此类调整的示例(仅第一行-其余都相同)


好啊但是我需要重复这些台词吗?我的意思是,没有办法创建一个列列表,然后遍历该列表?请提供输入数据和相应预期输出的示例-请参阅您是否希望/有所有列以这种方式处理-或者您希望提供需要处理的列列表?这两种情况都很容易做到,但稍有不同,所以请告诉我哪一种是你的。仅以
vl
开头的列。还有几个其他变量不需要此过程OK。应该很简单:o)-将在一天后发布答案请注意我刚刚做的更新-我意识到-我在regexp的一个地方遗漏了(打字)
*
,谢谢。它起作用了。我是BQ的新手,然后需要一段时间,直到我想出如何运行脚本。我用脚本内容创建了一个名为
format\u wage.sql
的文件,然后在gcloud shell
bq查询中运行——使用\u legacy\u sql=false——flagfile=format\u wage.sql
。也许这些信息对这个问题的未来读者会有用
execute immediate (select 'select * replace(' || 
  string_agg('cast(replace(' || column || ', ",", ".") as float64) as ' || column, ', ') || 
  ') from YourTable'
from (
  select regexp_extract_all(to_json_string(t), r'"(vl_[^"]*)":') as columns
  from YourTable t
  limit 1
), unnest(columns) column);    
select 1 id, "0000006025,55" vl_x, "000006,42" y, "0000005921,09" vl_z union all
select 2, "0000001447,68", "000001,54", "0000001726,67" 
execute immediate (select 'create table NewTable as select * replace(' || 
. . .