Google bigquery BigQuery-使用两个字段提取REGEXP

Google bigquery BigQuery-使用两个字段提取REGEXP,google-bigquery,Google Bigquery,我有一个包含三个字段的表: Ref Alt INFO A T SNP;FN1,DKFZp686O22169;DKFZp686O22169(uc002vez.2)///FN1(uc010zjp.1)///FN1(uc002vfa.2)///FN1(uc002vfb.2)///FN1(uc002vfc.2)///FN1(uc002vfd.2)///FN1(uc002vfe.2)///FN1(uc002vff.2)///FN1(uc002vfg.2)///FN1(uc002vfh.2

我有一个包含三个字段的表:

Ref Alt INFO

A   T       SNP;FN1,DKFZp686O22169;DKFZp686O22169(uc002vez.2)///FN1(uc010zjp.1)///FN1(uc002vfa.2)///FN1(uc002vfb.2)///FN1(uc002vfc.2)///FN1(uc002vfd.2)///FN1(uc002vfe.2)///FN1(uc002vff.2)///FN1(uc002vfg.2)///FN1(uc002vfh.2)///FN1(uc002vfi.2)///FN1(uc002vfj.2)///FN1(uc010fvc.1)///FN1(uc010fvd.1);Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding;5UTR///Intron_8///Intron_32///Intron_31///Intron_31///Intron_32///Intron_31///Intron_31///Intron_31///Intron_31///Intron_32///Intron_32///Intron_2///Intron_2;.///.///.///.///.///.///.///.///.///.///.///.///.///.;.///.///.///.///.///.///.///.///.///.///.///.///.///.;A-0.9491,T-0.0509;A-970,T-52;A/A-0.9002,A/T-0.0978,T/T-0.0020;A/A-460,A/T-50,T/T-1,N/N-0
我是否可以使用前两个字段从
INFO
字段中提取
A/T-0.0978


谢谢

下面是BigQuery标准SQL,并提取给定组合的所有值(在特定示例中为两个)

如果您针对问题中的数据运行它,结果将是

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 'A' Ref, 'T' Alt, 'SNP;FN1,DKFZp686O22169;DKFZp686O22169(uc002vez.2)///FN1(uc010zjp.1)///FN1(uc002vfa.2)///FN1(uc002vfb.2)///FN1(uc002vfc.2)///FN1(uc002vfd.2)///FN1(uc002vfe.2)///FN1(uc002vff.2)///FN1(uc002vfg.2)///FN1(uc002vfh.2)///FN1(uc002vfi.2)///FN1(uc002vfj.2)///FN1(uc010fvc.1)///FN1(uc010fvd.1);Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding;5UTR///Intron_8///Intron_32///Intron_31///Intron_31///Intron_32///Intron_31///Intron_31///Intron_31///Intron_31///Intron_32///Intron_32///Intron_2///Intron_2;.///.///.///.///.///.///.///.///.///.///.///.///.///.;.///.///.///.///.///.///.///.///.///.///.///.///.///.;A-0.9491,T-0.0509;A-970,T-52;A/A-0.9002,A/T-0.0978,T/T-0.0020;A/A-460,A/T-50,T/T-1,N/N-0' INFO
)
SELECT REGEXP_EXTRACT_ALL(INFO, CONCAT(r'[,;](', Ref, '/', Alt, '.*?)[,;]')) val
FROM `project.dataset.table`   
有输出

Row     val  
1       A/T-0.0978   
        A/T-50   
如果您想要第一个值,您可以使用

#standardSQL
SELECT REGEXP_EXTRACT_ALL(INFO, CONCAT(r'[,;](', Ref, '/', Alt, '.*?)[,;]'))[OFFSET(0)] val 
FROM `project.dataset.table`  


下面是BigQuery标准SQL和提取给定组合的所有值(在您的特定示例中有两个)

如果您针对问题中的数据运行它,结果将是

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 'A' Ref, 'T' Alt, 'SNP;FN1,DKFZp686O22169;DKFZp686O22169(uc002vez.2)///FN1(uc010zjp.1)///FN1(uc002vfa.2)///FN1(uc002vfb.2)///FN1(uc002vfc.2)///FN1(uc002vfd.2)///FN1(uc002vfe.2)///FN1(uc002vff.2)///FN1(uc002vfg.2)///FN1(uc002vfh.2)///FN1(uc002vfi.2)///FN1(uc002vfj.2)///FN1(uc010fvc.1)///FN1(uc010fvd.1);Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding;5UTR///Intron_8///Intron_32///Intron_31///Intron_31///Intron_32///Intron_31///Intron_31///Intron_31///Intron_31///Intron_32///Intron_32///Intron_2///Intron_2;.///.///.///.///.///.///.///.///.///.///.///.///.///.;.///.///.///.///.///.///.///.///.///.///.///.///.///.;A-0.9491,T-0.0509;A-970,T-52;A/A-0.9002,A/T-0.0978,T/T-0.0020;A/A-460,A/T-50,T/T-1,N/N-0' INFO
)
SELECT REGEXP_EXTRACT_ALL(INFO, CONCAT(r'[,;](', Ref, '/', Alt, '.*?)[,;]')) val
FROM `project.dataset.table`   
有输出

Row     val  
1       A/T-0.0978   
        A/T-50   
如果您想要第一个值,您可以使用

#standardSQL
SELECT REGEXP_EXTRACT_ALL(INFO, CONCAT(r'[,;](', Ref, '/', Alt, '.*?)[,;]'))[OFFSET(0)] val 
FROM `project.dataset.table`