Sql 如何根据列值选择不同百分比的数据?

Sql 如何根据列值选择不同百分比的数据?,sql,postgresql,Sql,Postgresql,我需要查询一个包含性别列的表,如下所示: | id | gender | name | ------------------------- | 1 | M | Michael | ------------------------- | 2 | F | Hanna | ------------------------- | 3 | M | Louie | ------------------------- 我需要提取前N个结果,比如80%的男性和2

我需要查询一个包含性别列的表,如下所示:

| id | gender | name | ------------------------- | 1 | M | Michael | ------------------------- | 2 | F | Hanna | ------------------------- | 3 | M | Louie | ------------------------- 我需要提取前N个结果,比如80%的男性和20%的女性。所以,如果我需要1000个结果,我想检索800只雄性和200只雌性

可以在一个查询中完成吗?怎么做

如果我没有足够的记录,想象一下,在上面的例子中,我只有700名男性,是否可以自动选择700/300


我没有带postgresql,但第一个场景非常简单,在MS SQL 2012中有一个联合体。我想你也可以在postgre中这样做:

declare @MaxRows            INT
        ,@PercentageMale    INT
        ,@PercentageFemale  INT

select      @MaxRows = 1000
            ,@PercentageMale = 80
            ,@PercentageFemale = 20

select  top (@MaxRows*@PercentageMale/100)  *
FROM        someTable
WHERE       Gender = 'M'
UNION
select  top (@MaxRows*@PercentageFemale/100)    *
FROM        someTable
WHERE       Gender = 'F'
第二点其实很简单。基本上,您希望选择顶部%的男性,然后用女性填充列表的其余部分,直至总行数。女性的数量实际上并不相关:

declare @MaxRows            INT
        ,@PercentageMale    INT

select      @MaxRows = 1000
            ,@PercentageMale = 80

SELECT TOP @MaxRows *
FROM
(
    select  top (@MaxRows*@PercentageMale/100)  *
    FROM        someTable
    WHERE       Gender = 'M'
    UNION
    select  top (@MaxRows)  * --we never want more than @MaxRows 
                              --so no need to check for a %, 
                              --just fill in the rest of the data set
    FROM        someTable
    WHERE       Gender = 'F'
) a

我没有带postgresql,但第一个场景非常简单,在MS SQL 2012中有一个联合体。我想你也可以在postgre中这样做:

declare @MaxRows            INT
        ,@PercentageMale    INT
        ,@PercentageFemale  INT

select      @MaxRows = 1000
            ,@PercentageMale = 80
            ,@PercentageFemale = 20

select  top (@MaxRows*@PercentageMale/100)  *
FROM        someTable
WHERE       Gender = 'M'
UNION
select  top (@MaxRows*@PercentageFemale/100)    *
FROM        someTable
WHERE       Gender = 'F'
第二点其实很简单。基本上,您希望选择顶部%的男性,然后用女性填充列表的其余部分,直至总行数。女性的数量实际上并不相关:

declare @MaxRows            INT
        ,@PercentageMale    INT

select      @MaxRows = 1000
            ,@PercentageMale = 80

SELECT TOP @MaxRows *
FROM
(
    select  top (@MaxRows*@PercentageMale/100)  *
    FROM        someTable
    WHERE       Gender = 'M'
    UNION
    select  top (@MaxRows)  * --we never want more than @MaxRows 
                              --so no need to check for a %, 
                              --just fill in the rest of the data set
    FROM        someTable
    WHERE       Gender = 'F'
) a

假设您为M/F分布提供了行计数lmt和浮动,那么下面的情况如何:

create table gen (
id     integer,
gender text,
name   text
);

-- inserts 75% males and 25% females into the source table ("gen")
insert into gen select n, case when mod(n,5) = 0 then 'F' else 'M' end, (case when mod(n,5) = 0 then 'F' else 'M' end)||'_'||n::text
from generate_series(1,20000) n


-- extract 80/20 M vs F
with conf as (select 1000 as lmt, .80::FLOAT as mpct, .20::FLOAT as fpct),
     g as (select id,gender,name,row_number() over (partition by gender order by gender) rn from gen)
select *
from g
where (gender = 'M' and rn <= (select lmt*mpct from conf))
or (gender = 'F' and rn <= (select lmt*fpct from conf));


-- Same query, to show the percent M vs F:
with conf as (select 1000 as lmt, .80::FLOAT as mpct, .20::FLOAT as fpct),
     g as (select id,gender,name,row_number() over (partition by gender order by gender) rn from gen)
select gender,count(*)
from (
    select *
    from g
    where (gender = 'M' and rn <= (select lmt*mpct from conf))
    or (gender = 'F' and rn <= (select lmt*fpct from conf))
    ) y
group by gender

假设您为M/F分布提供了行计数lmt和浮动,那么下面的情况如何:

create table gen (
id     integer,
gender text,
name   text
);

-- inserts 75% males and 25% females into the source table ("gen")
insert into gen select n, case when mod(n,5) = 0 then 'F' else 'M' end, (case when mod(n,5) = 0 then 'F' else 'M' end)||'_'||n::text
from generate_series(1,20000) n


-- extract 80/20 M vs F
with conf as (select 1000 as lmt, .80::FLOAT as mpct, .20::FLOAT as fpct),
     g as (select id,gender,name,row_number() over (partition by gender order by gender) rn from gen)
select *
from g
where (gender = 'M' and rn <= (select lmt*mpct from conf))
or (gender = 'F' and rn <= (select lmt*fpct from conf));


-- Same query, to show the percent M vs F:
with conf as (select 1000 as lmt, .80::FLOAT as mpct, .20::FLOAT as fpct),
     g as (select id,gender,name,row_number() over (partition by gender order by gender) rn from gen)
select gender,count(*)
from (
    select *
    from g
    where (gender = 'M' and rn <= (select lmt*mpct from conf))
    or (gender = 'F' and rn <= (select lmt*fpct from conf))
    ) y
group by gender

基本上,您希望获得尽可能多的'M',但不超过您的百分比,然后获得足够的'F',因此总共有1000行:

with cte_m as (
    select * from Table1 where gender = 'M' limit (1000 * 0.8)
), cte as (
    select *, 0 as ord from cte_m
    union all
    select *, 1 as ord from Table1 where gender = 'F'
    order by ord
    limit 1000
)
select id, gender, name
from cte

基本上,您希望获得尽可能多的'M',但不超过您的百分比,然后获得足够的'F',因此总共有1000行:

with cte_m as (
    select * from Table1 where gender = 'M' limit (1000 * 0.8)
), cte as (
    select *, 0 as ord from cte_m
    union all
    select *, 1 as ord from Table1 where gender = 'F'
    order by ord
    limit 1000
)
select id, gender, name
from cte

对于场景2,应该发生什么?我已经编辑了我的答案,以便更好地解释我自己。不幸的是,我不知道足够的SQL来给出代码方面的答案,但我可以给出逻辑:我建议一个SP,并有一个值,N您正在选择的数字,取N*.8,选择其中性别为M,将返回的行计数为numResultsMale,选择N-numResultsMale,其中性别是旁注,性别作为布尔值或M/F迟早会让您或您的用户陷入一些麻烦。允许“其他”或“未指定”通常是个好主意。有些人在生理上和/或心理上不是100%的男性或100%的女性,无论是出生还是改变。@CraigRinger,也许他们希望这样。满足所有用户的所有需求并不总是一个目标。我理解你的评论,并同意它在许多情况下都是有效的,但我认为如果他愿意,我们应该让他将性别存储为布尔值。对于场景2,应该发生什么?我编辑了我的答案以更好地解释我自己。不幸的是,我不知道足够的SQL以代码形式给出答案,但我可以给出逻辑:我建议一个SP,有一个值,N个你正在选择的数字,取N*.8,选择其中的性别是M,计算返回的行数为numResultsMale,选择N-numResultsMale,其中性别是一个旁注,性别是布尔值或M/F迟早会让你或你的用户陷入一些麻烦。允许“其他”或“未指定”通常是个好主意。有些人在生理上和/或心理上不是100%的男性或100%的女性,无论是出生还是改变。@CraigRinger,也许他们希望这样。满足所有用户的所有需求并不总是一个目标。我理解你的评论,并同意这在很多情况下都是有效的,但我相信如果他愿意,我们应该让他将性别存储为布尔值。这太完美了!谢谢太好了!谢谢