Google bigquery 检查BigQuery中的多个固定字符串值时需要正则表达式
我正在寻找以下在Regex中大查询场景的帮助 “我的输入”列可以获取以下任何或多个以分号分隔的项:Google bigquery 检查BigQuery中的多个固定字符串值时需要正则表达式,google-bigquery,Google Bigquery,我正在寻找以下在Regex中大查询场景的帮助 “我的输入”列可以获取以下任何或多个以分号分隔的项: Anonymisation Pseudonymisation Hard Deletion Enhanced Access Management Cold Storage Other Managed through Risk Processes Null 为此,我尝试了以下查询: SELECT INPUT_COL, case when REGEXP_CONTAINS (INPUT_COL,r'((
Anonymisation
Pseudonymisation
Hard Deletion
Enhanced Access Management
Cold Storage
Other
Managed through Risk Processes
Null
为此,我尝试了以下查询:
SELECT INPUT_COL, case when REGEXP_CONTAINS (INPUT_COL,r'((\b(PSEUDONYMISATION|ANONYMISATION|ENHANCED ACCESS MANAGEMENT|HARD DELETION|COLD STORAGE|OTHER|MANAGED THROUGH RISK PROCESSES)\b)|(\b(PSEUDONYMISATION|ANONYMISATION|ENHANCED ACCESS MANAGEMENT|HARD DELETION|COLD STORAGE|OTHER|MANAGED THROUGH RISK PROCESSES)\b:?\b(PSEUDONYMISATION|ANONYMISATION|ENHANCED ACCESS MANAGEMENT|HARD DELETION|COLD STORAGE|OTHER|MANAGED THROUGH RISK PROCESSES)\b:?\b(PSEUDONYMISATION|ANONYMISATION|ENHANCED ACCESS MANAGEMENT|HARD DELETION|COLD STORAGE|OTHER|MANAGED THROUGH RISK PROCESSES)\b:?\b(PSEUDONYMISATION|ANONYMISATION|ENHANCED ACCESS MANAGEMENT|HARD DELETION|COLD STORAGE|OTHER|MANAGED THROUGH RISK PROCESSES)\b:?\b(PSEUDONYMISATION|ANONYMISATION|ENHANCED ACCESS MANAGEMENT|HARD DELETION|COLD STORAGE|OTHER|MANAGED THROUGH RISK PROCESSES)\b:?\b(PSEUDONYMISATION|ANONYMISATION|ENHANCED ACCESS MANAGEMENT|HARD DELETION|COLD STORAGE|OTHER|MANAGED THROUGH RISK PROCESSES)\b:?\b(PSEUDONYMISATION|ANONYMISATION|ENHANCED ACCESS MANAGEMENT|HARD DELETION|COLD STORAGE|OTHER|MANAGED THROUGH RISK PROCESSES)\b:?\b(PSEUDONYMISATION|ANONYMISATION|ENHANCED ACCESS MANAGEMENT|HARD DELETION|COLD STORAGE|OTHER|MANAGED THROUGH RISK PROCESSES)\b))$')=True then '' else 'E' END as Error_Ind from( SELECT 'ANONYMISATION:ABCD' AS INPUT_COL UNION ALL
SELECT 'ANONYMISATION:ENHANCED ACCESS MANAGEMENT:HARD DELETION' AS INPUT_COL UNION ALL
SELECT 'OTER' AS INPUT_COL UNION ALL
SELECT 'ABCD:ENHANCED ACCESS MANAGEMENT' AS INPUT_COL UNION ALL
SELECT 'ANONYMISATION:PSEUDONYMISATION' AS INPUT_COL UNION ALL
SELECT 'ANONYMISATION:PSEUDONYMISATION:HARD DELETION:ENHANCED ACCESS MANAGEMENT:COLD STORAGE:OTHER:MANAGED THROUGH RISK PROCESSES' AS INPUT_COL)
结果如下
在以黄色突出显示的图片中,是一个错误的字符串,因为第一个值ABCD未在我的值列表中定义,但在结果中显示为正确的值。
同时,如果你看到第5条记录,它给我的错误是正确的
有人能帮上忙吗?试试下面的(BigQuery标准SQL)
另一个选择是
#standardSQL
SELECT
(
SELECT IF(COUNT(1) = 0, '', 'E')
FROM UNNEST(SPLIT(INPUT_COL, ':')) value
WHERE NOT UPPER(value) IN UNNEST(SPLIT(UPPER('Anonymisation|Pseudonymisation|Hard Deletion|Enhanced Access Management|Cold Storage|Other|Managed through Risk Processes|Null'), '|'))
) AS Error_Ind,
INPUT_COL
FROM `project.dataset.table`
同样的结果
Row Error_Ind INPUT_COL
1 E ANONYMISATION:ABCD
2 ANONYMISATION:ENHANCED ACCESS MANAGEMENT:HARD DELETION
3 E OTER
4 E ABCD:ENHANCED ACCESS MANAGEMENT
5 ANONYMISATION:PSEUDONYMISATION
6 ANONYMISATION:PSEUDONYMISATION:HARD DELETION:ENHANCED ACCESS MANAGEMENT:COLD STORAGE:OTHER:MANAGED THROUGH RISK PROCESSES
嗨,米哈伊尔。。如果下面两个单词之间有空格,请与上面的组合一起使用..如何验证然后匿名化:ABCD
Row Error_Ind INPUT_COL
1 E ANONYMISATION:ABCD
2 ANONYMISATION:ENHANCED ACCESS MANAGEMENT:HARD DELETION
3 E OTER
4 E ABCD:ENHANCED ACCESS MANAGEMENT
5 ANONYMISATION:PSEUDONYMISATION
6 ANONYMISATION:PSEUDONYMISATION:HARD DELETION:ENHANCED ACCESS MANAGEMENT:COLD STORAGE:OTHER:MANAGED THROUGH RISK PROCESSES