Google bigquery 检查BigQuery中的多个固定字符串值时需要正则表达式

Google bigquery 检查BigQuery中的多个固定字符串值时需要正则表达式,google-bigquery,Google Bigquery,我正在寻找以下在Regex中大查询场景的帮助 “我的输入”列可以获取以下任何或多个以分号分隔的项: Anonymisation Pseudonymisation Hard Deletion Enhanced Access Management Cold Storage Other Managed through Risk Processes Null 为此,我尝试了以下查询: SELECT INPUT_COL, case when REGEXP_CONTAINS (INPUT_COL,r'((

我正在寻找以下在Regex中大查询场景的帮助

“我的输入”列可以获取以下任何或多个以分号分隔的项:

Anonymisation
Pseudonymisation
Hard Deletion
Enhanced Access Management
Cold Storage
Other
Managed through Risk Processes
Null
为此,我尝试了以下查询:

SELECT INPUT_COL, case when REGEXP_CONTAINS (INPUT_COL,r'((\b(PSEUDONYMISATION|ANONYMISATION|ENHANCED ACCESS MANAGEMENT|HARD DELETION|COLD STORAGE|OTHER|MANAGED THROUGH RISK PROCESSES)\b)|(\b(PSEUDONYMISATION|ANONYMISATION|ENHANCED ACCESS MANAGEMENT|HARD DELETION|COLD STORAGE|OTHER|MANAGED THROUGH RISK PROCESSES)\b:?\b(PSEUDONYMISATION|ANONYMISATION|ENHANCED ACCESS MANAGEMENT|HARD DELETION|COLD STORAGE|OTHER|MANAGED THROUGH RISK PROCESSES)\b:?\b(PSEUDONYMISATION|ANONYMISATION|ENHANCED ACCESS MANAGEMENT|HARD DELETION|COLD STORAGE|OTHER|MANAGED THROUGH RISK PROCESSES)\b:?\b(PSEUDONYMISATION|ANONYMISATION|ENHANCED ACCESS MANAGEMENT|HARD DELETION|COLD STORAGE|OTHER|MANAGED THROUGH RISK PROCESSES)\b:?\b(PSEUDONYMISATION|ANONYMISATION|ENHANCED ACCESS MANAGEMENT|HARD DELETION|COLD STORAGE|OTHER|MANAGED THROUGH RISK PROCESSES)\b:?\b(PSEUDONYMISATION|ANONYMISATION|ENHANCED ACCESS MANAGEMENT|HARD DELETION|COLD STORAGE|OTHER|MANAGED THROUGH RISK PROCESSES)\b:?\b(PSEUDONYMISATION|ANONYMISATION|ENHANCED ACCESS MANAGEMENT|HARD DELETION|COLD STORAGE|OTHER|MANAGED THROUGH RISK PROCESSES)\b:?\b(PSEUDONYMISATION|ANONYMISATION|ENHANCED ACCESS MANAGEMENT|HARD DELETION|COLD STORAGE|OTHER|MANAGED THROUGH RISK PROCESSES)\b))$')=True then '' else 'E' END as Error_Ind from( SELECT 'ANONYMISATION:ABCD' AS INPUT_COL UNION ALL
SELECT 'ANONYMISATION:ENHANCED ACCESS MANAGEMENT:HARD DELETION' AS INPUT_COL UNION ALL
SELECT 'OTER' AS INPUT_COL UNION ALL
SELECT 'ABCD:ENHANCED ACCESS MANAGEMENT' AS INPUT_COL UNION ALL
SELECT 'ANONYMISATION:PSEUDONYMISATION' AS INPUT_COL UNION ALL
SELECT 'ANONYMISATION:PSEUDONYMISATION:HARD DELETION:ENHANCED ACCESS MANAGEMENT:COLD STORAGE:OTHER:MANAGED THROUGH RISK PROCESSES' AS INPUT_COL)
结果如下

在以黄色突出显示的图片中,是一个错误的字符串,因为第一个值ABCD未在我的值列表中定义,但在结果中显示为正确的值。 同时,如果你看到第5条记录,它给我的错误是正确的

有人能帮上忙吗?

试试下面的(BigQuery标准SQL)

另一个选择是

#standardSQL
SELECT  
  (
    SELECT IF(COUNT(1) = 0, '', 'E')
    FROM UNNEST(SPLIT(INPUT_COL, ':')) value
    WHERE NOT UPPER(value) IN UNNEST(SPLIT(UPPER('Anonymisation|Pseudonymisation|Hard Deletion|Enhanced Access Management|Cold Storage|Other|Managed through Risk Processes|Null'), '|'))
  ) AS Error_Ind,
  INPUT_COL
FROM `project.dataset.table`   
同样的结果

Row Error_Ind   INPUT_COL    
1   E           ANONYMISATION:ABCD   
2               ANONYMISATION:ENHANCED ACCESS MANAGEMENT:HARD DELETION   
3   E           OTER     
4   E           ABCD:ENHANCED ACCESS MANAGEMENT  
5               ANONYMISATION:PSEUDONYMISATION   
6               ANONYMISATION:PSEUDONYMISATION:HARD DELETION:ENHANCED ACCESS MANAGEMENT:COLD STORAGE:OTHER:MANAGED THROUGH RISK PROCESSES   

 

嗨,米哈伊尔。。如果下面两个单词之间有空格,请与上面的组合一起使用..如何验证然后匿名化:ABCD
Row Error_Ind   INPUT_COL    
1   E           ANONYMISATION:ABCD   
2               ANONYMISATION:ENHANCED ACCESS MANAGEMENT:HARD DELETION   
3   E           OTER     
4   E           ABCD:ENHANCED ACCESS MANAGEMENT  
5               ANONYMISATION:PSEUDONYMISATION   
6               ANONYMISATION:PSEUDONYMISATION:HARD DELETION:ENHANCED ACCESS MANAGEMENT:COLD STORAGE:OTHER:MANAGED THROUGH RISK PROCESSES