Mysql 将列值除以impala中的总行数

Mysql 将列值除以impala中的总行数,mysql,sql,count,impala,Mysql,Sql,Count,Impala,SELECT COUNTDISTINCT cgi.sample_id由于Impala不允许SET操作或SELECT语句中的子查询,我很难弄清楚如何将列值除以返回的总行数。我的最终目标是计算每个chr:起始位置的次要等位基因频率 我的数据结构如下: | chr | start | stop | ref | allele1seq | allele2seq | sample_id | | 6 | 66720709 | 66720710 | A | A |

SELECT COUNTDISTINCT cgi.sample_id由于Impala不允许SET操作或SELECT语句中的子查询,我很难弄清楚如何将列值除以返回的总行数。我的最终目标是计算每个chr:起始位置的次要等位基因频率

我的数据结构如下:

| chr | start    | stop     | ref | allele1seq | allele2seq | sample_id | 
|  6  | 66720709 | 66720710 |  A  |      A     |     T      | 101-46-3  |
|  7  | 66720809 | 66720810 |  GG |      GA    |     GG     | 101-46-3  |
我想做一些类似于以下查询的事情:

WITH vars as
(SELECT cgi.chr, cgi.start, concat(cgi.chr, ':', CAST(cgi.start AS STRING)) as pos, cgi.ref, cgi.allele1seq, cgi.allele2seq,
    CASE 
        WHEN (cgi.allele1seq = cgi.ref AND cgi.allele2seq <> cgi.ref) THEN '1'  
        WHEN (cgi.allele1seq <> cgi.ref AND cgi.allele2seq = cgi.ref) THEN '1' 
        WHEN (cgi.allele1seq = cgi.ref AND cgi.allele2seq = cgi.ref) THEN '2' 
        ELSE '0' END as ma_count
    FROM comgen_variants as cgi)

SELECT vars.*, (CAST(vars.ma_count as INT)/
((SELECT COUNT(DISTINCT cgi.sample_id) from comgen_variants as cgi) * 2)) as maf
FROM vars
除了找出一种除以行数的方法外,我还需要将结果按chr和pos分组,然后计算每个等位基因(其中等位基因1Seq和等位基因2Seq不等于ref)出现的次数,而不是像我上面所说的那样简单地按行计数;但由于计数问题,我还没走到那一步


提前感谢你的帮助

看起来您可以预先计算不同样本ID的总数*2,然后将其用于后续查询,因为该值每行不会改变。如果该值确实取决于行,则可能需要查看

但是,由于看起来您不需要这样做,您可以执行以下操作:

WITH total AS 
(SELECT COUNT(DISTINCT sample_id) * 2 AS total FROM comgen_variants)

SELECT cgi.*,
       (CASE 
          WHEN (cgi.allele1seq = cgi.ref AND cgi.allele2seq <> cgi.ref) THEN 1  
          WHEN (cgi.allele1seq <> cgi.ref AND cgi.allele2seq = cgi.ref) THEN 1 
          WHEN (cgi.allele1seq = cgi.ref AND cgi.allele2seq = cgi.ref) THEN 2
          ELSE 0 END) / total.total AS maf
FROM comgen_variants AS cgi, total;

不过,我不确定这就是次要等位基因频率;似乎您希望为每个基因座选择第二常见的等位基因频率?

谢谢!是的,它还不是很MAF,我只是被困在如何做这部分。。。非常感谢你的帮助。
WITH total AS 
(SELECT COUNT(DISTINCT sample_id) * 2 AS total FROM comgen_variants)

SELECT cgi.*,
       (CASE 
          WHEN (cgi.allele1seq = cgi.ref AND cgi.allele2seq <> cgi.ref) THEN 1  
          WHEN (cgi.allele1seq <> cgi.ref AND cgi.allele2seq = cgi.ref) THEN 1 
          WHEN (cgi.allele1seq = cgi.ref AND cgi.allele2seq = cgi.ref) THEN 2
          ELSE 0 END) / total.total AS maf
FROM comgen_variants AS cgi, total;