Google bigquery 提取值限制为10或更少的BigQuery返回正确的结果,更改限制或添加提取返回空值

Google bigquery 提取值限制为10或更少的BigQuery返回正确的结果,更改限制或添加提取返回空值,google-bigquery,google-cloud-platform,google-genomics,Google Bigquery,Google Cloud Platform,Google Genomics,以下是基因组数据的一个问题: 我在大查询中对pgp数据使用以下查询: (为了简单起见,使用了一个示例id:hu089792) BigQuery不保证输出行的顺序(除非您添加了明确的order BY) 所以,当您更改限制时,很可能会在owtput中得到不同的行,并且对于这些行,相应的提取将生成NULL 为了测试-我建议添加特定的ORDER BY,这样您将有一致的行输出,这样您将比较橙子和橙子,而不是苹果 *SELECT sample_id, allele1Gene,

以下是基因组数据的一个问题: 我在大查询中对pgp数据使用以下查询: (为了简单起见,使用了一个示例id:hu089792)


BigQuery不保证输出行的顺序(除非您添加了明确的order BY)
所以,当您更改限制时,很可能会在owtput中得到不同的行,并且对于这些行,相应的提取将生成NULL

为了测试-我建议添加特定的ORDER BY,这样您将有一致的行输出,这样您将比较橙子和橙子,而不是苹果

    *SELECT
      sample_id,
      allele1Gene,
      NTH(2,SPLIT(s.allele1XRef,':')) AS rsID,
      NTH(1,SPLIT(allele1Gene,';')) AS input,
      NTH(3,SPLIT((NTH(1,SPLIT(allele1Gene,';'))),':')) AS gene1,
      NTH(2,SPLIT(allele1Gene,';')) AS input2,
      NTH(3,SPLIT((NTH(2,SPLIT(allele1Gene,';'))),':')) AS gene2
    FROM
      [speedy-emissary-167213:pgp_orielresearch.pgp_variants_gene_dbsnp_hu089792] AS s
    LIMIT
      10*

    **the result is as expected:**
    *Row|   sample_id|  allele1Gene|    rsID|   input|  gene1|  input2| gene2   
    -------------------- 
    1   hu089792    10645:NM_006549.3:CAMKK2:INTRON:UNKNOWN-INC;10645:NM_153499.2:CAMKK2:INTRON:UNKNOWN-INC;10645:NM_153500.1:CAMKK2:INTRON:UNKNOWN-INC;10645:NM_172214.2:CAMKK2:INTRON:UNKNOWN-INC;10645:NM_172215.2:CAMKK2:INTRON:UNKNOWN-INC;10645:NM_172216.1:CAMKK2:INTRON:UNKNOWN-INC;10645:NM_172226.2:CAMKK2:INTRON:UNKNOWN-INC rs3794207   10645:NM_006549.3:CAMKK2:INTRON:UNKNOWN-INC CAMKK2  10645:NM_153499.2:CAMKK2:INTRON:UNKNOWN-INC CAMKK2   
    2   hu089792    387357:NM_001010923.2:THEMIS:INTRON:UNKNOWN-INC;387357:NM_001164685.1:THEMIS:INTRON:UNKNOWN-INC;387357:NM_001164687.1:THEMIS:INTRON:UNKNOWN-INC rs683202    387357:NM_001010923.2:THEMIS:INTRON:UNKNOWN-INC THEMIS  387357:NM_001164685.1:THEMIS:INTRON:UNKNOWN-INC THEMIS   
    3   hu089792    10207:NM_176877.2:INADL:INTRON:UNKNOWN-INC  rs2666491   10207:NM_176877.2:INADL:INTRON:UNKNOWN-INC  INADL   null    null*    


    **when i change the limit to 100 / or add another gene extraction, i get null results:**
    **the limit change query is:**

    SELECT

      sample_id,

      allele1Gene,

      NTH(2,SPLIT(s.allele1XRef,':')) AS rsID,

      NTH(1,SPLIT(allele1Gene,';')) AS input,

      NTH(3,SPLIT((NTH(1,SPLIT(allele1Gene,';'))),':')) AS gene1,

      NTH(2,SPLIT(allele1Gene,';')) AS input2,

      NTH(3,SPLIT((NTH(2,SPLIT(allele1Gene,';'))),':')) AS gene2

    FROM

          [speedy-emissary-167213:pgp_orielresearch.pgp_variants_gene_dbsnp_hu089792] AS s
        LIMIT
          1000

    **The result is:**
    *Row|   sample_id|  allele1Gene|    rsID|   input|  gene1|  input2| gene2
    ------------------   
    1   hu089792    null    rs6078843   null    null    null    null     
    2   hu089792    null    rs79092469  null    null    null    null     
    3   hu089792    null    rs56216546  null    null    null    null     
    4   hu089792    null    rs9576011   null    null    null    null*    

    **The other query (adding extraction query):**



SELECT
  sample_id,
  allele1Gene,
  NTH(2,SPLIT(s.allele1XRef,':')) AS rsID,
  NTH(1,SPLIT(allele1Gene,';')) AS input,
  NTH(3,SPLIT((NTH(1,SPLIT(allele1Gene,';'))),':')) AS gene1,
  NTH(2,SPLIT(allele1Gene,';')) AS input2,
  NTH(3,SPLIT((NTH(2,SPLIT(allele1Gene,';'))),':')) AS gene2,
  NTH(3,SPLIT(allele1Gene,';')) AS input3,
  NTH(3,SPLIT((NTH(3,SPLIT(allele1Gene,';'))),':')) AS gene3


   FROM
      [speedy-emissa167213:pgp_orielresearch.pgp_variants_gene_dbsnp_hu089792] AS s
        LIMIT 10

        **returns:**
        *Row|   sample_id|  allele1Gene|    rsID|   input|  gene1|  input2| gene2|  input3| gene3
        -----------------------  
        1   hu089792    null    rs6551009   null    null    null    null    null    null     
        2   hu089792    null    rs2050586   null    null    null    null    null    null     
        3   hu089792    null    rs7151797   null    null    null    null    null    null*    

        **any idea why?**

        Any help is greatly appreciated

        Best,
        eilalan

        **the original table includes 3 columns that are extracted from [google.com:biggene:pgp.cgi_variants]
        see below:**
        Row|    sample_id|  allele1XRef|    allele1Gene|
        -------------    
        1   hu089792    dbsnp.107:rs3794207 10645:NM_006549.3:CAMKK2:INTRON:UNKNOWN-INC;10645:NM_153499.2:CAMKK2:INTRON:UNKNOWN-INC;10645:NM_153500.1:CAMKK2:INTRON:UNKNOWN-INC;10645:NM_172214.2:CAMKK2:INTRON:UNKNOWN-INC;10645:NM_172215.2:CAMKK2:INTRON:UNKNOWN-INC;10645:NM_172216.1:CAMKK2:INTRON:UNKNOWN-INC;10645:NM_172226.2:CAMKK2:INTRON:UNKNOWN-INC

    2   hu089792    dbsnp.83:rs683202   387357:NM_001010923.2:THEMIS:INTRON:UNKNOWN-INC;387357:NM_001164685.1:THEMIS:INTRON:UNKNOWN-INC;387357:NM_001164687.1:THEMIS:INTRON:UNKNOWN-INC  

    3   hu089792    dbsnp.100:rs2666491 10207:NM_176877.2:INADL:INTRON:UNKNOWN-INC