Php 从可变权重随机生成组合

Php 从可变权重随机生成组合,php,algorithm,probability,Php,Algorithm,Probability,非常重要的编辑:所有Ai都是唯一的 问题 我有一个n唯一对象的列表。每个对象Ai都有一个可变百分比Pi 我想创建一个算法,生成k个对象的新列表B(k

非常重要的编辑:所有Ai都是唯一的

问题 我有一个n唯一对象的列表。每个对象Ai都有一个可变百分比Pi

我想创建一个算法,生成k个对象的新列表B(k 对象Ai出现在B中的概率是Pi

我试过的 (这些狙击手在PHP中仅用于测试) 我先列了一张单子

首先,我尝试了以下两种算法(它们在PHP中只是为了测试):

当回顾算法时,这是有意义的。该算法错误地将原始百分比解释为针对任何给定位置拾取对象的概率百分比,而不是针对任何列表B。因此,例如,实际上,在列表B中拾取Z的概率为93%,但针对索引Bn拾取Z的概率为20%。这不是我想要的。我希望在列表B中选择Z的概率为20%

这可能吗?怎样才能做到呢

编辑1 我试着简单地求出所有Pi=k的和,如果所有Pi都相等,这是可行的,但是在修改它们的值之后,它开始变得越来越错误

初始概率

$list= [
    "A" => 8.4615,
    "B" => 68.4615,
    "C" => 13.4615,
    "D" => 63.4615,
    "E" => 18.4615,
    "F" => 58.4615,
    "G" => 23.4615,
    "H" => 53.4615,
    "I" => 28.4615,
    "J" => 48.4615,
    "K" => 33.4615,
    "L" => 43.4615,
    "M" => 38.4615,
    "N" => 38.4615,
    "O" => 38.4615,
    "P" => 38.4615,
    "Q" => 38.4615,
    "R" => 38.4615,
    "S" => 38.4615,
    "T" => 38.4615,
    "U" => 38.4615,
    "V" => 38.4615,
    "W" => 38.4615,
    "X" => 38.4615,
    "Y" =>38.4615,
    "Z" => 38.4615
];
10000次运行后的结果

Array
(
    [A] => 10.324
    [B] => 59.298
    [C] => 15.902
    [D] => 56.299
    [E] => 21.16
    [F] => 53.621
    [G] => 25.907
    [H] => 50.163
    [I] => 30.932
    [J] => 47.114
    [K] => 35.344
    [L] => 43.175
    [M] => 39.141
    [N] => 39.127
    [O] => 39.346
    [P] => 39.364
    [Q] => 39.501
    [R] => 39.05
    [S] => 39.555
    [T] => 39.239
    [U] => 39.283
    [V] => 39.408
    [W] => 39.317
    [X] => 39.339
    [Y] => 39.569
    [Z] => 39.522
)

让我们分析一下。 带有替换项(不是您想要的,但更易于分析)

给定大小为
k
的列表
L
,以及元素
a_i
,列表中
a_i
的概率由值
p_i
表示

让我们检查
a_i
位于列表中某个索引
j
的概率。让我们把概率表示为
q_i,j
。请注意,对于列表中的任何索引
t
q_i,j=q_i,t
——因此我们可以简单地说
q_i_1=q_i_2=…=q_i_k=q_i

a_i位于列表中任何位置的概率表示为:

1-(1-q_i)^k
但是它也是p_i,所以我们需要解这个方程

1-(1-q_i)^k = pi
1 - (1-q_i)^k -pi = 0
一种方法是

计算每个元素的概率后,检查其是否确实是一个可概率空间(总和为1,所有概率均在[0,1]中)。如果不是-对于给定的概率和
k
,就不能这样做


无需更换:这更为棘手,因为现在
q_i,j!=q_i,t
(所选内容不是i.i.d.)。这里的概率计算要复杂得多,我现在不确定如何计算它们,需要在运行时完成,在创建列表的过程中


(删除了一个我几乎可以肯定是有偏见的解决方案)。

除非我的数学技能比我认为的要差得多,否则在你的示例中,列表a中的一个元素在列表B中被发现的平均几率应该是10/26=0.38。
如果你降低了任何物体的这个几率,肯定还有其他物体的几率更高。 此外,列表A中的您的遗嘱认证无法计算:他们太低:您无法填写您的列表/您没有足够的元素可供选择

假设以上是正确的(或足够正确),这意味着在你的列表中,你的平均体重必须是随机挑选的平均机会。这反过来意味着列表a中的概率总和不等于100


除非我完全错了,否则…

我们必须有
sum\u i p\u i=k
,否则我们不能成功

如上所述,这个问题有点简单,但你可能不喜欢这个答案,因为它“不够随机”

对于
A_1
A_2
,条件概率实际上是
1/5

相反,我们应该用产生适当条件分布的新概率
Q_i
。我不知道
Q_I
的封闭形式,所以我建议使用一种数值优化算法来找到它们,如。初始化
Q_i=P_i
(为什么不?)。使用动态规划,对于当前设置的
Q_i
,有可能找到
l
元素的结果,
A_i
是这些元素之一的概率。(我们只关心
l=k
条目,但我们需要其他条目使复发生效。)再多做一点工作,我们就可以得到整个梯度。对不起,这太粗略了

在Python 3中,使用一种似乎总是收敛的非线性求解方法(同时将每个
q_i
更新为其略微正确的值并进行规范化):

#/usr/bin/env蟒蛇3
导入集合
进口经营者
随机输入
def约束_样本(qs):
k=四舍五入(总和(qs))
尽管如此:
sample=[i代表i,q在枚举(qs)中,如果为random.random()assert abs(sum(size_dist)-1)
对象a出现在B中的概率是Pn。
这很棘手,我相信这不是你想要的。具体来说,如果
k=n/2
,至少一半的元素应该具有
B_i>=1/2
@amit,我很确定这就是我想要的
$list= [
    "A" => 8.4615,
    "B" => 68.4615,
    "C" => 13.4615,
    "D" => 63.4615,
    "E" => 18.4615,
    "F" => 58.4615,
    "G" => 23.4615,
    "H" => 53.4615,
    "I" => 28.4615,
    "J" => 48.4615,
    "K" => 33.4615,
    "L" => 43.4615,
    "M" => 38.4615,
    "N" => 38.4615,
    "O" => 38.4615,
    "P" => 38.4615,
    "Q" => 38.4615,
    "R" => 38.4615,
    "S" => 38.4615,
    "T" => 38.4615,
    "U" => 38.4615,
    "V" => 38.4615,
    "W" => 38.4615,
    "X" => 38.4615,
    "Y" =>38.4615,
    "Z" => 38.4615
];
Array
(
    [A] => 10.324
    [B] => 59.298
    [C] => 15.902
    [D] => 56.299
    [E] => 21.16
    [F] => 53.621
    [G] => 25.907
    [H] => 50.163
    [I] => 30.932
    [J] => 47.114
    [K] => 35.344
    [L] => 43.175
    [M] => 39.141
    [N] => 39.127
    [O] => 39.346
    [P] => 39.364
    [Q] => 39.501
    [R] => 39.05
    [S] => 39.555
    [T] => 39.239
    [U] => 39.283
    [V] => 39.408
    [W] => 39.317
    [X] => 39.339
    [Y] => 39.569
    [Z] => 39.522
)
1-(1-q_i)^k
1-(1-q_i)^k = pi
1 - (1-q_i)^k -pi = 0
Sample a uniform random permutation Perm on the integers [0, n)
Sample X uniformly at random from [0, 1)
For i in Perm
    If X < P_i, then append A_i to B and update X := X + (1 - P_i)
    Else, update X := X - P_i
End
{}: probability 2/9
{A_1}: probability 1/9
{A_2}: probability 4/9
{A_1, A_2}: probability 2/9,
#!/usr/bin/env python3
import collections
import operator
import random


def constrained_sample(qs):
    k = round(sum(qs))
    while True:
        sample = [i for i, q in enumerate(qs) if random.random() < q]
        if len(sample) == k:
            return sample


def size_distribution(qs):
    size_dist = [1]
    for q in qs:
        size_dist.append(0)
        for j in range(len(size_dist) - 1, 0, -1):
            size_dist[j] += size_dist[j - 1] * q
            size_dist[j - 1] *= 1 - q
    assert abs(sum(size_dist) - 1) <= 1e-10
    return size_dist


def size_distribution_without(size_dist, q):
    size_dist = size_dist[:]
    if q >= 0.5:
        for j in range(len(size_dist) - 1, 0, -1):
            size_dist[j] /= q
            size_dist[j - 1] -= size_dist[j] * (1 - q)
        del size_dist[0]
    else:
        for j in range(1, len(size_dist)):
            size_dist[j - 1] /= 1 - q
            size_dist[j] -= size_dist[j - 1] * q
        del size_dist[-1]
    assert abs(sum(size_dist) - 1) <= 1e-10
    return size_dist


def test_size_distribution(qs):
    d = size_distribution(qs)
    for i, q in enumerate(qs):
        d1a = size_distribution_without(d, q)
        d1b = size_distribution(qs[:i] + qs[i + 1 :])
        assert len(d1a) == len(d1b)
        assert max(map(abs, map(operator.sub, d1a, d1b))) <= 1e-10


def normalized(qs, k):
    sum_qs = sum(qs)
    qs = [q * k / sum_qs for q in qs]
    assert abs(sum(qs) / k - 1) <= 1e-10
    return qs


def approximate_qs(ps, reps=100):
    k = round(sum(ps))
    qs = ps[:]
    for j in range(reps):
        size_dist = size_distribution(qs)
        for i, p in enumerate(ps):
            d = size_distribution_without(size_dist, qs[i])
            d.append(0)
            qs[i] = p * d[k] / ((1 - p) * d[k - 1] + p * d[k])
        qs = normalized(qs, k)
    return qs


def test(ps, reps=100000):
    print(ps)
    qs = approximate_qs(ps)
    print(qs)
    counter = collections.Counter()
    for j in range(reps):
        counter.update(constrained_sample(qs))
    test_size_distribution(qs)
    print("p", "Actual", sep="\t")
    for i, p in enumerate(ps):
        print(p, counter[i] / reps, sep="\t")


if __name__ == "__main__":
    test([2 / 3, 1 / 2, 1 / 2, 1 / 3])