python中无替换加权随机样本_Python_Numpy_Random

python中无替换加权随机样本

python numpy random

python中无替换加权随机样本,python,numpy,random,Python,Numpy,Random,我需要从一个群体中获得一个不需要替换的k大小的样本，其中群体中的每个成员都有一个相关的权重（W） Numpy的random.choices在没有替换的情况下将无法执行此任务，并且random.sample不会接受加权输入目前，我使用的是： P = np.zeros((1,Parent_number)) n=0 while n < Parent_number: draw = random.choices(population,weights=W,k=1) if draw n

我需要从一个群体中获得一个不需要替换的k大小的样本，其中群体中的每个成员都有一个相关的权重（W）

Numpy的random.choices在没有替换的情况下将无法执行此任务，并且random.sample不会接受加权输入

目前，我使用的是：

P = np.zeros((1,Parent_number))
n=0
while n < Parent_number:
    draw = random.choices(population,weights=W,k=1)
    if draw not in P:
        P[0,n] = draw[0]
        n=n+1
P=np.asarray(sorted(P[0]))

P=np.zero（（1，父项编号））
n=0
当n


虽然这是可行的，但它需要在阵列、列表和阵列之间来回切换，因此并不理想
我正在寻找最简单、最容易理解的解决方案，因为此代码将与其他人共享。
您可以使用np.random.choice
和replace=False
，如下所示：
np.random.choice(vec,size,replace=False, p=P)

其中，vec
是您的人口，p
是权重向量
例如：
import numpy as np
vec=[1,2,3]
P=[0.5,0.2,0.3]
np.random.choice(vec,size=2,replace=False, p=P)

内置解决方案
正如Miriam Farber所建议的，您可以使用numpy的内置解决方案：
np.random.choice(vec,size,replace=False, p=P)

纯python等价物
接下来的内容与numpy内部的工作非常接近。当然，它使用numpy数组和numpy.random.choices（）
相关问题：元素可以重复时的选择
这有时被称为urn问题。例如，给定一个装有10个红色球、4个白色球和18个绿色球的瓮，选择9个球而不更换
要使用numpy执行此操作，请使用sample（）从总人口计数生成唯一选择。然后，将累积权重平分得到总体指数
import numpy as np
from random import sample

population = np.array(['red', 'blue', 'green'])
counts = np.array([10, 4, 18])
k = 9

cum_counts = np.add.accumulate(counts)
total = cum_counts[-1]
selections = sample(range(total), k=k)
indices = np.searchsorted(cum_counts, selections, side='right')
result = population[indices]

要在没有*numpy'的情况下完成此操作，可以使用标准库中的bisect（）和accumulate（）实现相同的方法：
from random import sample
from bisect import bisect
from itertools import accumulate

population = ['red', 'blue', 'green']
weights = [10, 4, 18]
k = 9

cum_weights = list(accumulate(weights))
total = cum_weights.pop()
selections = sample(range(total), k=k)
indices = [bisect(cum_weights, s) for s in selections]
result = [population[i] for i in indices]

numpy可能是最好的选择。但这里有另一个纯Python解决方案
用于未替换的加权样本
有几种方法可以定义总体
和权重
参数的用途<代码>总体

可以定义为表示项目的总体，以及影响选择的偏差列表。例如，在赛马模拟中，

population

可以是马-每个马都有一个名称，并且

weights

它们的性能评级。以下功能遵循此模型

from random import random
from bisect import bisect_left
from itertools import accumulate

def wsample(population, weights, k=1):
    wts   = list(weights)
    sampl = []
    rnums = [random() for _ in range(k)]
    for r in rnums:
        acm_wts = list(accumulate(wts))
        total   = acm_wts[-1]
        i       = bisect_left(acm_wts, total * r)
        p       = population[i]
        wts[i]  = 0
        sampl.append(p)
    return sampl

通过将所选个体的权重设置为0并重新计算累积权重，可以有效地将其从进一步选择中移除。如果使用此选项，请确保

k“Numpy的random.choices在没有替换的情况下不会执行此任务”-它是Numpy.random.choice
，而不是choices，如果您告诉它，它会执行。您是否使用内置的random
模块而不是Numpy.random
<代码>随机。选择

不是一件很重要的事情。啊，是的，你是对的。我使用的是python随机模块，但如果没有替换，下一次采样的总体规模应该会减少。这是怎么发生的？我真的很喜欢这里深思熟虑的问题分析。想想替换到底意味着什么是值得的。谢谢@RaymondHettinger，我也喜欢你的纯python解决方案。我不得不在shell中尝试它，以了解它在做什么，它给了我一些尝试的想法。

from random import random
from bisect import bisect_left
from itertools import accumulate

def wsample(population, weights, k=1):
    wts   = list(weights)
    sampl = []
    rnums = [random() for _ in range(k)]
    for r in rnums:
        acm_wts = list(accumulate(wts))
        total   = acm_wts[-1]
        i       = bisect_left(acm_wts, total * r)
        p       = population[i]
        wts[i]  = 0
        sampl.append(p)
    return sampl

def wsample(population, weights, k=1):
    accum = list(accumulate(weights))
    total = accum[-1]
    sampl = {}
    while len(sampl) < k:
        index        = bisect_left(accum, total * random())
        sampl[index] = population[index]
    return list(sampl.values())

timeit.timeit("wsample(population, weights, k=5)", globals=globals(), number=10**4)
21.74719240899867

timeit.timeit("wsample(population, weights, k=5)", globals=globals(), number=10**4)
4.32836378099455

timeit.timeit("wsample(population, acm_weights, k=5)", globals=globals(), number=10**4)
0.05602245099726133