Python 骨灰盒上的Numpy图纸_Python_Numpy_Scipy_Scikit Learn_Random Sample

Python 骨灰盒上的Numpy图纸

python numpy scikit-learn

Python 骨灰盒上的Numpy图纸,python,numpy,scipy,scikit-learn,random-sample,Python,Numpy,Scipy,Scikit Learn,Random Sample,我想在numpy中运行一个相对简单的随机抽取，但是我找不到一个好的方法来表达它。我认为最好的方式是将其描述为从瓮中提取而不替换。我有一个有k种颜色的骨灰盒，还有各种颜色的球。我想画m个球，知道我有多少种颜色的球我目前的尝试是 np.bincount（np.random.permutation（np.repeat（np.arange（k），n_k））[：m]，minlength=k）这里，n_k是一个长度为k的数组，其中包含球的计数这似乎相当于 np.bincount（np.random.

我想在numpy中运行一个相对简单的随机抽取，但是我找不到一个好的方法来表达它。我认为最好的方式是将其描述为从瓮中提取而不替换。我有一个有k种颜色的骨灰盒，还有各种颜色的球。我想画m个球，知道我有多少种颜色的球

我目前的尝试是

np.bincount（np.random.permutation（np.repeat（np.arange（k），n_k））[：m]，minlength=k）

这里，

n_k

是一个长度为k的数组，其中包含球的计数

这似乎相当于

np.bincount（np.random.choice（k，m，n_k/n_k.sum（），minlength=k）

哪一个更好一些，但仍然不是很好。

以下方法应该可以奏效：

def make_sampling_arr(n_k):
    out = [ x for s in [ [i] * n_k[i] for i in range(len(n_k)) ] for x in s ]
    return out

np.random.choice(make_sampling_arr(n_k), m)

您需要的是的实现。我不知道在numpy或scipy有没有，但它可能已经存在于某处了

我为numpy 1.18.0提供了多元超几何分布的实现；请参阅

例如，要从包含12个红色、4个绿色和18个蓝色大理石的瓮中提取15个样本，并重复该过程10次：

In [4]: import numpy as np

In [5]: rng = np.random.default_rng()

In [6]: colors = [12, 4, 18]

In [7]: rng.multivariate_hypergeometric(colors, 15, size=10)                    
Out[7]: 
array([[ 5,  4,  6],
       [ 3,  3,  9],
       [ 6,  2,  7],
       [ 7,  2,  6],
       [ 3,  0, 12],
       [ 5,  2,  8],
       [ 6,  2,  7],
       [ 7,  1,  7],
       [ 8,  1,  6],
       [ 6,  1,  8]])

这个答案的其余部分现在已经过时了，但我将留给后代（不管这意味着什么…）

您可以使用对的重复调用来实现它。这是否比您的实现更有效取决于有多少种颜色以及每种颜色的球数

例如，下面是一个脚本，用于打印从包含三种颜色（红色、绿色和蓝色）的瓮中绘制的结果：

样本输出：

red:    6
green:  1
blue:   8

下面的函数概括为选择

球，给定一个数组

colors

保存每种颜色的数量：

def sample(m, colors):
    """
    Parameters
    ----------
    m : number balls to draw from the urn
    colors : one-dimensional array of number balls of each color in the urn

    Returns
    -------
    One-dimensional array with the same length as `colors` containing the
    number of balls of each color in a random sample.
    """

    remaining = np.cumsum(colors[::-1])[::-1]
    result = np.zeros(len(colors), dtype=np.int)
    for i in range(len(colors)-1):
        if m < 1:
            break
        result[i] = np.random.hypergeometric(colors[i], remaining[i+1], m)
        m -= result[i]
    result[-1] = m
    return result

我想要计数，你需要在最后调用bincount。而且

make\u sampling\u arr

与

np相同。重复上面使用的方法，对吗？酷。非常有趣的解决方案，谢谢！如果有多项式超几何；）对，sample
基本上是多元超几何分布（）的一个实现。请注意，pymc提供了一个函数（rmultivariate\u hypergeometric），用于从多元超几何分布中提取随机变量。链接如下：
def sample(m, colors):
    """
    Parameters
    ----------
    m : number balls to draw from the urn
    colors : one-dimensional array of number balls of each color in the urn

    Returns
    -------
    One-dimensional array with the same length as `colors` containing the
    number of balls of each color in a random sample.
    """

    remaining = np.cumsum(colors[::-1])[::-1]
    result = np.zeros(len(colors), dtype=np.int)
    for i in range(len(colors)-1):
        if m < 1:
            break
        result[i] = np.random.hypergeometric(colors[i], remaining[i+1], m)
        m -= result[i]
    result[-1] = m
    return result

>>> sample(10, [2, 4, 8, 16])
array([2, 3, 1, 4])