Postgresql PL/pgSQL函数，用于随机选择id_Postgresql_Plpgsql

Postgresql PL/pgSQL函数，用于随机选择id

postgresql

Postgresql PL/pgSQL函数，用于随机选择id,postgresql,plpgsql,Postgresql,Plpgsql,目标：使用顺序id列表预先填充表格，例如从1到1000000。该表还有一个可为零的列。空值标记为未分配，非空值标记为已分配具有我可以调用的函数，该函数要求从表中随机选择x个尚未分配的ID。这是为了一些非常具体的事情，虽然我知道有不同的方法可以做到这一点，但我想知道在这个特定的实现中是否有解决该缺陷的方法我有一些部分工作，但不知道函数中的缺陷在哪里这是桌子： CREATE SEQUENCE accounts_seq MINVALUE 700000000001 NO MAXVALUE;

目标：

使用顺序id列表预先填充表格，例如从1到1000000。该表还有一个可为零的列。空值标记为未分配，非空值标记为已分配具有我可以调用的函数，该函数要求从表中随机选择x个尚未分配的ID。这是为了一些非常具体的事情，虽然我知道有不同的方法可以做到这一点，但我想知道在这个特定的实现中是否有解决该缺陷的方法

我有一些部分工作，但不知道函数中的缺陷在哪里

这是桌子：

CREATE SEQUENCE accounts_seq MINVALUE 700000000001 NO MAXVALUE;

CREATE TABLE accounts (
  id BIGINT PRIMARY KEY default nextval('accounts_seq'), 
  client VARCHAR(25), UNIQUE(id, client)
);

此函数gen_account_id只是一个一次性设置，用于使用固定数量的行预填充表，所有行都标记为未分配

因此，我使用它来预填充表，比如1000条记录：

SELECT gen_account_ids(1000);

下一个函数assign用于随机选择未分配id unassigned means client列为null，并使用客户端值更新该列，使其变为已分配。它返回受影响的行数

它有时会工作，但我确实相信会发生冲突——这就是为什么我尝试使用DISTINCT，但它返回的行数通常少于所需的行数。例如，如果我选择assign100，‘foo’；它可能返回95行，而不是所需的100行

如何修改它以使其始终返回准确的所需行

   /*
     This will assign ids to a client randomly
     @param int is the number of account numbers to generate
     @param varchar(10) is a string descriptor for the client
     @returns the number of rows affected -- should be the same as the input int

     Call it like this: `SELECT * FROM assign(100, 'FOO')`
   */
   CREATE OR REPLACE FUNCTION assign(INT, VARCHAR(10))
     RETURNS INT AS $$
   DECLARE
     total ALIAS FOR $1;
     clientname ALIAS FOR $2;
     rowcount int;
   BEGIN
     UPDATE accounts SET client = clientname WHERE id IN (
       SELECT DISTINCT trunc(random() * (
         (SELECT max(id) FROM accounts WHERE client IS NULL) - 
         (SELECT min(id) FROM accounts WHERE client IS NULL)) + 
         (SELECT min(id) FROM accounts WHERE client IS NULL)) FROM generate_series(1, total));
     GET DIAGNOSTICS rowcount = ROW_COUNT;
     RETURN rowcount;
   END;
   $$ LANGUAGE plpgsql;

这大致基于您可以从generate_series1,5中选择truncrandom*100-1+1；它将选择1到100之间的5个随机数

我的目标是做一些类似的事情，在最小和最大未分配行之间选择一个随机id，并将其标记为更新。

因为随机行子集的id不是连续的，所以选择一个随机行编号而不是随机id

这里不需要函数，但如果需要，可以轻松地将查询放在函数体中

此查询使用空客户端更新5个随机行

不是生成n个不同随机值的正确方法。重复的概率随着商n/max_值的增加而增加

这不是最好的答案b/c它确实涉及到完整的表扫描，但在我的情况下，我不关心性能，它是有效的。这是基于@CraigRinger对博客文章的引用

我通常会对其他可能更好的解决方案感兴趣，并且特别想知道为什么原始解决方案不够，以及@klin还设计了什么

下面是我的蛮力随机顺序解决方案：

-- generate a million unassigned rows with null client column
insert into accounts(client) select null from generate_series(1, 1000000);

-- assign 1000 random rows to client 'foo'
update accounts set client = 'foo' where id in 
  (select id from accounts where client is null order by random() limit 1000);

相关博客：@CraigRinger thx！看起来真的很酷！我不确定它是否对我有用，但在那个博客上有很多好的信息……调查。谢谢@klin-这确实会生成一个唯一的id。我想我之所以定义为函数，是因为它会定期用于生成，即为某个x总数分配一组随机id。例如，分配100个随机ID，分配1000个随机ID。我的下一步是研究如何使用窗口函数。我将看看是否可以尝试将此合并到为某个总xAh分配多个随机ID中，我说得太早了。它确实遇到了与我上面的解决方案相同的问题。我认为这个解决方案是sql最好、最自然的解决方案。不要担心性能，因为可能的替代方案应该同样或更昂贵。你可以平静地接受你的回答，并感谢克雷格。感谢克林-我感谢你花时间调查并提供一些解决方案。还感谢@CraigRinger的链接-该博客上有很多好信息。如果你想提交答案，如果你想报应，我会接受。我不能接受我自己的答案，直到19小时过去。

with nulls as ( -- base query
    select id
    from accounts 
    where client is null
    ),
randoms as ( -- calculate random int in range 1..count(nulls.*) 
    select trunc(random()* (count(*) - 1) + 1)::int random_value
    from nulls
    ),
row_numbers as ( -- add row numbers to nulls
    select id, row_number() over (order by id) rn
    from nulls
    )
select id
from row_numbers, randoms
where rn = random_value; -- random row number

update accounts
set client = 'new value' -- <-- clientname
where id in (
    with nulls as ( -- base query
        select id
        from accounts 
        where client is null
        ),
    randoms as ( -- calculate random int in range 1..count(nulls.*) 
        select i, trunc(random()* (count(*) - 1) + 1)::int random_value
        from nulls
        cross join generate_series(1, 5) i -- <--  total
        group by 1
        ),
    row_numbers as ( -- add row numbers to nulls in order by id
        select id, row_number() over (order by id) rn
        from nulls
        )
    select id
    from row_numbers, randoms
    where rn = random_value -- random row number
)

select trunc(random()* (max_value - 1) + 1)::int
from generate_series(1, n)

-- generate a million unassigned rows with null client column
insert into accounts(client) select null from generate_series(1, 1000000);

-- assign 1000 random rows to client 'foo'
update accounts set client = 'foo' where id in 
  (select id from accounts where client is null order by random() limit 1000);