Sql 针对不同值的迭代子采样,联合结果

Sql 针对不同值的迭代子采样,联合结果,sql,sql-server,tsql,Sql,Sql Server,Tsql,我犯了一个错误 我有一个表,每行都有:一个类别、一个文档id和一个排名 这些类别在其内部进行排序。对于每个类别,我想选择一个子样本。所有子样本应堆放在一张桌子上 问题是,我希望通过迭代获取该类别中减半的行索引来进行子采样,例如,如果给定类别有32个项,那么我希望获取第32、16、8、4、2、1行 在我的研究中,我能够针对一个特定类别执行此操作,但我不知道如何: a针对[主要关注领域]中的所有类别进行 b将生成的子样本合并到一个表中 非常感谢任何提示或帮助!我在TSQL MS SQL Server

我犯了一个错误

我有一个表,每行都有:一个类别、一个文档id和一个排名

这些类别在其内部进行排序。对于每个类别,我想选择一个子样本。所有子样本应堆放在一张桌子上

问题是,我希望通过迭代获取该类别中减半的行索引来进行子采样,例如,如果给定类别有32个项,那么我希望获取第32、16、8、4、2、1行

在我的研究中,我能够针对一个特定类别执行此操作,但我不知道如何:

a针对[主要关注领域]中的所有类别进行 b将生成的子样本合并到一个表中

非常感谢任何提示或帮助!我在TSQL MS SQL Server中工作

MS Sql的示例数据:

CREATE TABLE Rank_MajorAreas
    ([Rank] int, [Major Focus Area] varchar(17), [ID] int)
;

INSERT INTO Rank_MajorAreas
    ([Rank], [Major Focus Area], [ID])
VALUES
    (1, 'Welfare', 71366),
    (2, 'Welfare', 70415),
    (3, 'Truck Driving', 70423),
    (4, 'Peasant''s Office', 74566),
    (5, 'Peasant''s Office', 71560),
    (6, 'Nail Therapy', 77497),
    (7, 'Truck Driving', 76193),
    (8, 'Truck Driving', 79226),
    (9, 'Truck Driving', 70222),
    (10, 'Welfare', 77336),
    (11, 'Truck Driving', 70823),
    (12, 'Welfare', 77096),
    (13, 'Welfare', 71335),
    (14, 'Nail Therapy', 73551),
    (15, 'Welfare', 72146),
    (16, 'Truck Driving', 74023),
    (17, 'Welfare', 71546),
    (18, 'Nail Therapy', 74755),
    (19, 'Peasant''s Office', 77834),
    (20, 'Welfare', 75667),
    (21, 'Peasant''s Office', 71342),
    (22, 'Peasant''s Office', 77457),
    (23, 'Peasant''s Office', 77923),
    (24, 'Welfare', 76508),
    (25, 'Welfare', 75714),
    (26, 'Welfare', 73654),
    (27, 'Welfare', 75753),
    (28, 'Truck Driving', 71481),
    (29, 'Truck Driving', 79424),
    (30, 'Peasant''s Office', 76143),
    (31, 'Truck Driving', 74076),
    (32, 'Nail Therapy', 78714),
    (33, 'Nail Therapy', 79924),
    (34, 'Welfare', 71482),
    (35, 'Welfare', 70050),
    (36, 'Welfare', 76053),
    (37, 'Nail Therapy', 79591),
    (38, 'Peasant''s Office', 75197),
    (39, 'Nail Therapy', 74104),
    (40, 'Welfare', 72891),
    (41, 'Truck Driving', 73621),
    (42, 'Peasant''s Office', 71713),
    (43, 'Welfare', 71979),
    (44, 'Peasant''s Office', 71601),
    (45, 'Peasant''s Office', 73928),
    (46, 'Nail Therapy', 71759),
    (47, 'Nail Therapy', 70379),
    (48, 'Welfare', 71215),
    (49, 'Truck Driving', 70908),
    (50, 'Welfare', 71989)
;
迄今为止的守则:

CREATE VIEW MFA AS
  SELECT ROW_NUMBER() OVER(ORDER BY fa.[Rank] ASC) AS Row
        ,*
  FROM Rank_MajorAreas AS fa
  -- ideally we could make a view per Focus Area
  WHERE fa.[Major Focus Area] = 'Welfare'
  ORDER BY Row ASC
  OFFSET 0 ROWS;

DECLARE @start int
SELECT @start = (SELECT COUNT(*) FROM MFA)

;WITH Sample( Row ) AS
(
  Select @start as Row
    UNION ALL
  SELECT ROUND(Row/2, 0)
    FROM Sample
    WHERE Row > 0
)
SELECT * FROM MFA AS mfa
INNER JOIN Sample AS s on s.Row = mfa.Row
ORDER BY mfa.Row ASC
期望的结果,在每个焦点区域进行二次抽样的情况下,二次抽样作为单个结果一起返回

Row Rank    Major Focus Area    ID
1   1   Welfare 71366   
2   2   Welfare 70415   
4   12  Welfare 77096   
9   24  Welfare 76508   
19  50  Welfare 71989   
...
1   6   Nail Therapy    77497
2   14  Nail Therapy    73551
4   32  Nail Therapy    78714
9   47  Nail Therapy    7037
您需要在OVER子句中的主焦点区域列上使用partitionby。以下是修改后的TSQL

CREATE VIEW MFA AS
  SELECT ROW_NUMBER() OVER(PARTITION BY fa.[Major Focus Area] ORDER BY fa.[Rank] ASC) AS Row
        ,*
  FROM Rank_MajorAreas AS fa
  -- ideally we could make a view per Focus Area
  ORDER BY [Major Focus Area], Row ASC
  OFFSET 0 ROWS;

DECLARE @start int
SELECT @start = (SELECT COUNT(*) FROM MFA)

;WITH Sample( Row, fa ) AS
(
  Select COUNT(*) as Row, [Major Focus Area] as fa  FROM MFA GROUP BY [Major Focus Area]
    UNION ALL
  SELECT ROUND(Row/2, 0), fa
    FROM Sample
    WHERE Row > 0
)

SELECT mfa.Row, mfa.Rank, mfa.[Major Focus Area] FROM MFA AS mfa
 INNER JOIN Sample AS s on s.Row = mfa.Row and s.fa=mfa.[Major Focus Area]
 ORDER BY [Major Focus Area], mfa.Row ASC

样本数据和期望的结果真的会有帮助。样本数据的期望结果是什么?复制的代码,从sql FIDLE@gordonlinoff查询添加了期望的结果@cdaiga