Sql 从每个类别中选择n个样本_Sql_Sqlite

Sql 从每个类别中选择n个样本

sql sqlite

Sql 从每个类别中选择n个样本,sql,sqlite,Sql,Sqlite,我有一列，分数，它是介于1和5之间的整数。我试图从每个分数中选择2000个样本。我自己的黑客攻击和其他问题导致我构造了以下查询： select * from (select text, score from data where score= 1 and LENGTH(text) > 45 limit 2000) union select * from (select text, score from data where score= 2 and LENGTH(text) >

我有一列，分数，它是介于1和5之间的整数。我试图从每个分数中选择2000个样本。我自己的黑客攻击和其他问题导致我构造了以下查询：

select * from (select text, score from data where score= 1 and LENGTH(text) > 45 limit 2000)
union
select * from (select text, score from data where score= 2 and LENGTH(text) > 45 limit 2000)
union
select * from (select text, score from data where score= 3 and LENGTH(text) > 45 limit 2000)
union
select * from (select text, score from data where score= 4 and LENGTH(text) > 45 limit 2000)
union
select * from (select text, score from data where score= 5 and LENGTH(text) > 45 limit 2000)

这感觉是最糟糕的方法，更糟糕的是，当我单独运行每个查询时，它会像预期的那样提供2k个结果，但当我运行这个联合时，我得到的行数不到10k

我正在寻求一些优化此查询的帮助，但更重要的是，我想了解为什么联合返回错误数量的结果

关于为什么您的查询返回错误数量的结果，我敢打赌，您的数据在每个查询返回的结果集中不明显。使用union时，它返回整个结果集中的不同行

尝试将其更改为“全部联合”：

如果您有一个主键，例如自动递增，那么这里有另一种方法为每组分数生成一个行号，这假设一个id主键：

select text, score
from (
  select text, score, 
         (select count(*) from data b 
          where a.id >= b.id and 
                a.score = b.score and 
                length(b.text) > 45) rn
  from data a
  where length(text) > 45
  ) t
where rn <= 2000

默认情况下，UNION比较所有行并仅返回不同的行。这就是为什么你的收入不到10公里。正如sgeddes所说，使用UNIONALL来获取所有10k行，包括重复的行。您确实想要重复的行，是吗？

我甚至没有想到可能会有重复的行，谢谢

select text, score
from (
  select text, score, 
         (select count(*) from data b 
          where a.id >= b.id and 
                a.score = b.score and 
                length(b.text) > 45) rn
  from data a
  where length(text) > 45
  ) t
where rn <= 2000