Sql 如何根据列值选择不同百分比的数据?
我需要查询一个包含性别列的表,如下所示: | id | gender | name | ------------------------- | 1 | M | Michael | ------------------------- | 2 | F | Hanna | ------------------------- | 3 | M | Louie | ------------------------- 我需要提取前N个结果,比如80%的男性和20%的女性。所以,如果我需要1000个结果,我想检索800只雄性和200只雌性 可以在一个查询中完成吗?怎么做 如果我没有足够的记录,想象一下,在上面的例子中,我只有700名男性,是否可以自动选择700/300Sql 如何根据列值选择不同百分比的数据?,sql,postgresql,Sql,Postgresql,我需要查询一个包含性别列的表,如下所示: | id | gender | name | ------------------------- | 1 | M | Michael | ------------------------- | 2 | F | Hanna | ------------------------- | 3 | M | Louie | ------------------------- 我需要提取前N个结果,比如80%的男性和2
我没有带postgresql,但第一个场景非常简单,在MS SQL 2012中有一个联合体。我想你也可以在postgre中这样做:
declare @MaxRows INT
,@PercentageMale INT
,@PercentageFemale INT
select @MaxRows = 1000
,@PercentageMale = 80
,@PercentageFemale = 20
select top (@MaxRows*@PercentageMale/100) *
FROM someTable
WHERE Gender = 'M'
UNION
select top (@MaxRows*@PercentageFemale/100) *
FROM someTable
WHERE Gender = 'F'
第二点其实很简单。基本上,您希望选择顶部%的男性,然后用女性填充列表的其余部分,直至总行数。女性的数量实际上并不相关:
declare @MaxRows INT
,@PercentageMale INT
select @MaxRows = 1000
,@PercentageMale = 80
SELECT TOP @MaxRows *
FROM
(
select top (@MaxRows*@PercentageMale/100) *
FROM someTable
WHERE Gender = 'M'
UNION
select top (@MaxRows) * --we never want more than @MaxRows
--so no need to check for a %,
--just fill in the rest of the data set
FROM someTable
WHERE Gender = 'F'
) a
我没有带postgresql,但第一个场景非常简单,在MS SQL 2012中有一个联合体。我想你也可以在postgre中这样做:
declare @MaxRows INT
,@PercentageMale INT
,@PercentageFemale INT
select @MaxRows = 1000
,@PercentageMale = 80
,@PercentageFemale = 20
select top (@MaxRows*@PercentageMale/100) *
FROM someTable
WHERE Gender = 'M'
UNION
select top (@MaxRows*@PercentageFemale/100) *
FROM someTable
WHERE Gender = 'F'
第二点其实很简单。基本上,您希望选择顶部%的男性,然后用女性填充列表的其余部分,直至总行数。女性的数量实际上并不相关:
declare @MaxRows INT
,@PercentageMale INT
select @MaxRows = 1000
,@PercentageMale = 80
SELECT TOP @MaxRows *
FROM
(
select top (@MaxRows*@PercentageMale/100) *
FROM someTable
WHERE Gender = 'M'
UNION
select top (@MaxRows) * --we never want more than @MaxRows
--so no need to check for a %,
--just fill in the rest of the data set
FROM someTable
WHERE Gender = 'F'
) a
假设您为M/F分布提供了行计数lmt和浮动,那么下面的情况如何:
create table gen (
id integer,
gender text,
name text
);
-- inserts 75% males and 25% females into the source table ("gen")
insert into gen select n, case when mod(n,5) = 0 then 'F' else 'M' end, (case when mod(n,5) = 0 then 'F' else 'M' end)||'_'||n::text
from generate_series(1,20000) n
-- extract 80/20 M vs F
with conf as (select 1000 as lmt, .80::FLOAT as mpct, .20::FLOAT as fpct),
g as (select id,gender,name,row_number() over (partition by gender order by gender) rn from gen)
select *
from g
where (gender = 'M' and rn <= (select lmt*mpct from conf))
or (gender = 'F' and rn <= (select lmt*fpct from conf));
-- Same query, to show the percent M vs F:
with conf as (select 1000 as lmt, .80::FLOAT as mpct, .20::FLOAT as fpct),
g as (select id,gender,name,row_number() over (partition by gender order by gender) rn from gen)
select gender,count(*)
from (
select *
from g
where (gender = 'M' and rn <= (select lmt*mpct from conf))
or (gender = 'F' and rn <= (select lmt*fpct from conf))
) y
group by gender
假设您为M/F分布提供了行计数lmt和浮动,那么下面的情况如何:
create table gen (
id integer,
gender text,
name text
);
-- inserts 75% males and 25% females into the source table ("gen")
insert into gen select n, case when mod(n,5) = 0 then 'F' else 'M' end, (case when mod(n,5) = 0 then 'F' else 'M' end)||'_'||n::text
from generate_series(1,20000) n
-- extract 80/20 M vs F
with conf as (select 1000 as lmt, .80::FLOAT as mpct, .20::FLOAT as fpct),
g as (select id,gender,name,row_number() over (partition by gender order by gender) rn from gen)
select *
from g
where (gender = 'M' and rn <= (select lmt*mpct from conf))
or (gender = 'F' and rn <= (select lmt*fpct from conf));
-- Same query, to show the percent M vs F:
with conf as (select 1000 as lmt, .80::FLOAT as mpct, .20::FLOAT as fpct),
g as (select id,gender,name,row_number() over (partition by gender order by gender) rn from gen)
select gender,count(*)
from (
select *
from g
where (gender = 'M' and rn <= (select lmt*mpct from conf))
or (gender = 'F' and rn <= (select lmt*fpct from conf))
) y
group by gender
基本上,您希望获得尽可能多的'M',但不超过您的百分比,然后获得足够的'F',因此总共有1000行:
with cte_m as (
select * from Table1 where gender = 'M' limit (1000 * 0.8)
), cte as (
select *, 0 as ord from cte_m
union all
select *, 1 as ord from Table1 where gender = 'F'
order by ord
limit 1000
)
select id, gender, name
from cte
基本上,您希望获得尽可能多的'M',但不超过您的百分比,然后获得足够的'F',因此总共有1000行:
with cte_m as (
select * from Table1 where gender = 'M' limit (1000 * 0.8)
), cte as (
select *, 0 as ord from cte_m
union all
select *, 1 as ord from Table1 where gender = 'F'
order by ord
limit 1000
)
select id, gender, name
from cte
对于场景2,应该发生什么?我已经编辑了我的答案,以便更好地解释我自己。不幸的是,我不知道足够的SQL来给出代码方面的答案,但我可以给出逻辑:我建议一个SP,并有一个值,N您正在选择的数字,取N*.8,选择其中性别为M,将返回的行计数为numResultsMale,选择N-numResultsMale,其中性别是旁注,性别作为布尔值或M/F迟早会让您或您的用户陷入一些麻烦。允许“其他”或“未指定”通常是个好主意。有些人在生理上和/或心理上不是100%的男性或100%的女性,无论是出生还是改变。@CraigRinger,也许他们希望这样。满足所有用户的所有需求并不总是一个目标。我理解你的评论,并同意它在许多情况下都是有效的,但我认为如果他愿意,我们应该让他将性别存储为布尔值。对于场景2,应该发生什么?我编辑了我的答案以更好地解释我自己。不幸的是,我不知道足够的SQL以代码形式给出答案,但我可以给出逻辑:我建议一个SP,有一个值,N个你正在选择的数字,取N*.8,选择其中的性别是M,计算返回的行数为numResultsMale,选择N-numResultsMale,其中性别是一个旁注,性别是布尔值或M/F迟早会让你或你的用户陷入一些麻烦。允许“其他”或“未指定”通常是个好主意。有些人在生理上和/或心理上不是100%的男性或100%的女性,无论是出生还是改变。@CraigRinger,也许他们希望这样。满足所有用户的所有需求并不总是一个目标。我理解你的评论,并同意这在很多情况下都是有效的,但我相信如果他愿意,我们应该让他将性别存储为布尔值。这太完美了!谢谢太好了!谢谢