Google bigquery BigQuery-使用两个字段提取REGEXP
我有一个包含三个字段的表:Google bigquery BigQuery-使用两个字段提取REGEXP,google-bigquery,Google Bigquery,我有一个包含三个字段的表: Ref Alt INFO A T SNP;FN1,DKFZp686O22169;DKFZp686O22169(uc002vez.2)///FN1(uc010zjp.1)///FN1(uc002vfa.2)///FN1(uc002vfb.2)///FN1(uc002vfc.2)///FN1(uc002vfd.2)///FN1(uc002vfe.2)///FN1(uc002vff.2)///FN1(uc002vfg.2)///FN1(uc002vfh.2
Ref Alt INFO
A T SNP;FN1,DKFZp686O22169;DKFZp686O22169(uc002vez.2)///FN1(uc010zjp.1)///FN1(uc002vfa.2)///FN1(uc002vfb.2)///FN1(uc002vfc.2)///FN1(uc002vfd.2)///FN1(uc002vfe.2)///FN1(uc002vff.2)///FN1(uc002vfg.2)///FN1(uc002vfh.2)///FN1(uc002vfi.2)///FN1(uc002vfj.2)///FN1(uc010fvc.1)///FN1(uc010fvd.1);Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding;5UTR///Intron_8///Intron_32///Intron_31///Intron_31///Intron_32///Intron_31///Intron_31///Intron_31///Intron_31///Intron_32///Intron_32///Intron_2///Intron_2;.///.///.///.///.///.///.///.///.///.///.///.///.///.;.///.///.///.///.///.///.///.///.///.///.///.///.///.;A-0.9491,T-0.0509;A-970,T-52;A/A-0.9002,A/T-0.0978,T/T-0.0020;A/A-460,A/T-50,T/T-1,N/N-0
我是否可以使用前两个字段从INFO
字段中提取A/T-0.0978
谢谢 下面是BigQuery标准SQL,并提取给定组合的所有值(在特定示例中为两个) 如果您针对问题中的数据运行它,结果将是
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'A' Ref, 'T' Alt, 'SNP;FN1,DKFZp686O22169;DKFZp686O22169(uc002vez.2)///FN1(uc010zjp.1)///FN1(uc002vfa.2)///FN1(uc002vfb.2)///FN1(uc002vfc.2)///FN1(uc002vfd.2)///FN1(uc002vfe.2)///FN1(uc002vff.2)///FN1(uc002vfg.2)///FN1(uc002vfh.2)///FN1(uc002vfi.2)///FN1(uc002vfj.2)///FN1(uc010fvc.1)///FN1(uc010fvd.1);Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding;5UTR///Intron_8///Intron_32///Intron_31///Intron_31///Intron_32///Intron_31///Intron_31///Intron_31///Intron_31///Intron_32///Intron_32///Intron_2///Intron_2;.///.///.///.///.///.///.///.///.///.///.///.///.///.;.///.///.///.///.///.///.///.///.///.///.///.///.///.;A-0.9491,T-0.0509;A-970,T-52;A/A-0.9002,A/T-0.0978,T/T-0.0020;A/A-460,A/T-50,T/T-1,N/N-0' INFO
)
SELECT REGEXP_EXTRACT_ALL(INFO, CONCAT(r'[,;](', Ref, '/', Alt, '.*?)[,;]')) val
FROM `project.dataset.table`
有输出
Row val
1 A/T-0.0978
A/T-50
如果您想要第一个值,您可以使用
#standardSQL
SELECT REGEXP_EXTRACT_ALL(INFO, CONCAT(r'[,;](', Ref, '/', Alt, '.*?)[,;]'))[OFFSET(0)] val
FROM `project.dataset.table`
或
下面是BigQuery标准SQL和提取给定组合的所有值(在您的特定示例中有两个) 如果您针对问题中的数据运行它,结果将是
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'A' Ref, 'T' Alt, 'SNP;FN1,DKFZp686O22169;DKFZp686O22169(uc002vez.2)///FN1(uc010zjp.1)///FN1(uc002vfa.2)///FN1(uc002vfb.2)///FN1(uc002vfc.2)///FN1(uc002vfd.2)///FN1(uc002vfe.2)///FN1(uc002vff.2)///FN1(uc002vfg.2)///FN1(uc002vfh.2)///FN1(uc002vfi.2)///FN1(uc002vfj.2)///FN1(uc010fvc.1)///FN1(uc010fvd.1);Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding///Protein_Coding;5UTR///Intron_8///Intron_32///Intron_31///Intron_31///Intron_32///Intron_31///Intron_31///Intron_31///Intron_31///Intron_32///Intron_32///Intron_2///Intron_2;.///.///.///.///.///.///.///.///.///.///.///.///.///.;.///.///.///.///.///.///.///.///.///.///.///.///.///.;A-0.9491,T-0.0509;A-970,T-52;A/A-0.9002,A/T-0.0978,T/T-0.0020;A/A-460,A/T-50,T/T-1,N/N-0' INFO
)
SELECT REGEXP_EXTRACT_ALL(INFO, CONCAT(r'[,;](', Ref, '/', Alt, '.*?)[,;]')) val
FROM `project.dataset.table`
有输出
Row val
1 A/T-0.0978
A/T-50
如果您想要第一个值,您可以使用
#standardSQL
SELECT REGEXP_EXTRACT_ALL(INFO, CONCAT(r'[,;](', Ref, '/', Alt, '.*?)[,;]'))[OFFSET(0)] val
FROM `project.dataset.table`
或