Python 随机选择的加权版本

Python 随机选择的加权版本,python,optimization,Python,Optimization,我需要编写random.choice的加权版本(列表中的每个元素被选中的概率不同)。这就是我想到的: def weightedChoice(choices): """Like random.choice, but each element can have a different chance of being selected. choices can be any iterable containing iterables with two items each.

我需要编写random.choice的加权版本(列表中的每个元素被选中的概率不同)。这就是我想到的:

def weightedChoice(choices):
    """Like random.choice, but each element can have a different chance of
    being selected.

    choices can be any iterable containing iterables with two items each.
    Technically, they can have more than two items, the rest will just be
    ignored.  The first item is the thing being chosen, the second item is
    its weight.  The weights can be any numeric values, what matters is the
    relative differences between them.
    """
    space = {}
    current = 0
    for choice, weight in choices:
        if weight > 0:
            space[current] = choice
            current += weight
    rand = random.uniform(0, current)
    for key in sorted(space.keys() + [current]):
        if rand < key:
            return choice
        choice = space[key]
    return None
def权重选项(选项):
“”类似于random.choice,但每个元素都有不同的
正在被选中。
选项可以是任何iterable,其中包含每个iterable有两个项。
从技术上讲,他们可以有两个以上的项目,其余的将只是
忽略。第一项是要选择的对象,第二项是
它的权重。权重可以是任何数值,重要的是
它们之间的相对差异。
"""
空格={}
电流=0
对于选择,选择中的权重:
如果重量>0:
空间[当前]=选择
电流+=重量
rand=随机。均匀(0,当前)
对于已排序的键(space.keys()+[current]):
如果rand<键:
返回选择
选择=空格[键]
一无所获

这个函数在我看来过于复杂,而且很难看。我希望在座的每个人都能提供一些改进的建议或其他方法。对我来说,效率不如代码清洁度和可读性重要。

粗糙,但可能足够:

def weighted_choice(choices):
   total = sum(w for c, w in choices)
   r = random.uniform(0, total)
   upto = 0
   for c, w in choices:
      if upto + w >= r:
         return c
      upto += w
   assert False, "Shouldn't get here"
import random
weighted_choice = lambda s : random.choice(sum(([v]*wt for v,wt in s),[]))
它有用吗

# define choices and relative weights
choices = [("WHITE",90), ("RED",8), ("GREEN",2)]

# initialize tally dict
tally = dict.fromkeys(choices, 0)

# tally up 1000 weighted choices
for i in xrange(1000):
    tally[weighted_choice(choices)] += 1

print tally.items()
印刷品:

[('WHITE', 904), ('GREEN', 22), ('RED', 74)]
假设所有权重都是整数。它们不必加起来等于100,我这样做只是为了让测试结果更容易解释。(如果权重是浮点数,则重复将它们全部乘以10,直到所有权重>=1。)

weights=[.6,2,001,199]
任何情况下(重量为w时,w<1.0):
权重=[w*10表示w的权重]
权重=贴图(整数,权重)
  • 把砝码排成一个小格子 累积分布
  • 使用random.random()选择一个随机变量
    float
    0.0我查看了指向的另一个线程,并在我的编码风格中找到了这种变化,这返回了用于计数的选择索引,但返回字符串很简单(注释返回选项):


    如果你有一个加权字典而不是一个列表,你可以写这个

    items = { "a": 10, "b": 5, "c": 1 } 
    random.choice([k for k in items for dummy in range(items[k])])
    

    请注意,
    [k代表范围内的虚拟项中的k(项[k])]
    生成此列表
    ['a','a','a','a','a','a','a','a','a','a','a','c','b','b','b','b','b','b','b','b']
    如果您不介意使用numpy,您可以使用

    例如:

    import numpy
    
    items  = [["item1", 0.2], ["item2", 0.3], ["item3", 0.45], ["item4", 0.05]
    elems = [i[0] for i in items]
    probs = [i[1] for i in items]
    
    trials = 1000
    results = [0] * len(items)
    for i in range(trials):
        res = numpy.random.choice(items, p=probs)  #This is where the item is selected!
        results[items.index(res)] += 1
    results = [r / float(trials) for r in results]
    print "item\texpected\tactual"
    for i in range(len(probs)):
        print "%s\t%0.4f\t%0.4f" % (items[i], probs[i], results[i])
    
    如果您知道需要提前进行多少选择,则可以不使用如下循环:

    numpy.random.choice(items, trials, p=probs)
    
    一般解决办法:

    import random
    def weighted_choice(choices, weights):
        total = sum(weights)
        treshold = random.uniform(0, total)
        for k, weight in enumerate(weights):
            total -= weight
            if total < treshold:
                return choices[k]
    
    随机导入
    def加权_选项(选项、权重):
    总计=总和(权重)
    treshold=随机、均匀(0,总计)
    对于k,枚举中的权重(权重):
    总重量=重量
    如果总数
    这里是另一个使用numpy的加权_选项版本。传入权重向量,它将返回一个0的数组,其中包含一个1,表示选择了哪个箱子。代码默认为只进行一次绘制,但您可以传入要绘制的绘制数量,并返回绘制的每个箱子的计数

    如果权重向量的和不等于1,则将对其进行归一化,使其等于1

    import numpy as np
    
    def weighted_choice(weights, n=1):
        if np.sum(weights)!=1:
            weights = weights/np.sum(weights)
    
        draws = np.random.random_sample(size=n)
    
        weights = np.cumsum(weights)
        weights = np.insert(weights,0,0.0)
    
        counts = np.histogram(draws, bins=weights)
        return(counts[0])
    

    自1.7.0版以来,NumPy有一个支持概率分布的函数

    from numpy.random import choice
    draw = choice(list_of_candidates, number_of_items_to_pick,
                  p=probability_distribution)
    

    注意,
    probability\u distribution
    是一个序列,其顺序与
    候选列表的顺序相同。您还可以使用关键字
    replace=False
    来更改行为,以便不替换绘制的项目。

    我可能来不及提供任何有用的内容,但这里有一个简单、简短且非常有效的代码片段:

    def choose_index(probabilies):
        cmf = probabilies[0]
        choice = random.random()
        for k in xrange(len(probabilies)):
            if choice <= cmf:
                return k
            else:
                cmf += probabilies[k+1]
    
    def选择索引(概率):
    cmf=概率[0]
    choice=random.random()
    对于x范围内的k(len(概率)):
    
    如果选择如果加权选择列表相对静态,并且需要频繁采样,则可以执行一个O(N)预处理步骤,然后使用中的函数在O(1)中执行选择


    由于Python3.6,模块中有一个方法

    请注意,
    random.choices
    将根据以下内容进行替换采样:

    返回从包含替换项的总体中选择的
    k
    大小的元素列表

    请注意答案的完整性:

    从有限总体中提取采样单位并返回时 在记录其特征后, 在抽取下一个单位之前,取样被称为“带 替换”。它基本上意味着每个元素可以被选择的多于 一次


    如果需要在不替换的情况下进行采样,则可以使用,其
    replace
    参数控制此类行为。

    以下是Python 3.6标准库中包含的版本:

    import itertools as _itertools
    import bisect as _bisect
    
    class Random36(random.Random):
        "Show the code included in the Python 3.6 version of the Random class"
    
        def choices(self, population, weights=None, *, cum_weights=None, k=1):
            """Return a k sized list of population elements chosen with replacement.
    
            If the relative weights or cumulative weights are not specified,
            the selections are made with equal probability.
    
            """
            random = self.random
            if cum_weights is None:
                if weights is None:
                    _int = int
                    total = len(population)
                    return [population[_int(random() * total)] for i in range(k)]
                cum_weights = list(_itertools.accumulate(weights))
            elif weights is not None:
                raise TypeError('Cannot specify both weights and cumulative weights')
            if len(cum_weights) != len(population):
                raise ValueError('The number of weights does not match the population')
            bisect = _bisect.bisect
            total = cum_weights[-1]
            return [population[bisect(cum_weights, random() * total)] for i in range(k)]
    

    来源:

    从Python
    v3.6开始,可用于返回给定总体中指定大小的元素的
    列表,其中包含可选权重

    random.选择(总体,权重=None,*,累计权重=None,k=1)

    • 总体:
      列表
      包含唯一观察结果。(如果为空,则引发
      索引器

    • 权重:进行选择所需的更精确的相对权重

    • 累积权重:进行选择所需的累积权重

    • k:要输出的
      列表的大小(
      len
      )。(默认值
      len()=1


    几点注意事项:

    import random
    
    options = ['a', 'b', 'c', 'd']
    weights = [1, 2, 5, 2]
    
    weighted_options = [[opt]*wgt for opt, wgt in zip(options, weights)]
    weighted_options = [opt for sublist in weighted_options for opt in sublist]
    print(weighted_options)
    
    # test
    
    counts = {c: 0 for c in options}
    for x in range(10000):
        counts[random.choice(weighted_options)] += 1
    
    for opt, wgt in zip(options, weights):
        wgt_r = counts[opt] / 10000 * sum(weights)
        print(opt, counts[opt], wgt, wgt_r)
    
    ['a', 'b', 'b', 'c', 'c', 'c', 'c', 'c', 'd', 'd']
    a 1025 1 1.025
    b 1948 2 1.948
    c 5019 5 5.019
    d 2008 2 2.008
    
    1) 它使用带替换的加权抽样,以便以后替换抽取的项目。权重序列中的值本身并不重要,但它们的相对比率确实重要

    与np.random.choice
    不同,np.random.choice只能将概率作为权重,并且必须确保单个概率的总和达到1个标准,这里没有此类规定。只要它们属于数值类型(
    int/float/france
    
    def choose_index(probabilies):
        cmf = probabilies[0]
        choice = random.random()
        for k in xrange(len(probabilies)):
            if choice <= cmf:
                return k
            else:
                cmf += probabilies[k+1]
    
    def choose_index(weights):
        probabilities = weights / sum(weights)
        cmf = probabilies[0]
        choice = random.random()
        for k in xrange(len(probabilies)):
            if choice <= cmf:
                return k
            else:
                cmf += probabilies[k+1]
    
    # run only when `choices` changes.
    preprocessed_data = prep(weight for _,weight in choices)
    
    # O(1) selection
    value = choices[sample(preprocessed_data)][0]
    
    In [1]: import random
    
    In [2]: random.choices(
    ...:     population=[['a','b'], ['b','a'], ['c','b']],
    ...:     weights=[0.2, 0.2, 0.6],
    ...:     k=10
    ...: )
    
    Out[2]:
    [['c', 'b'],
     ['c', 'b'],
     ['b', 'a'],
     ['c', 'b'],
     ['c', 'b'],
     ['b', 'a'],
     ['c', 'b'],
     ['b', 'a'],
     ['c', 'b'],
     ['c', 'b']]
    
    import itertools as _itertools
    import bisect as _bisect
    
    class Random36(random.Random):
        "Show the code included in the Python 3.6 version of the Random class"
    
        def choices(self, population, weights=None, *, cum_weights=None, k=1):
            """Return a k sized list of population elements chosen with replacement.
    
            If the relative weights or cumulative weights are not specified,
            the selections are made with equal probability.
    
            """
            random = self.random
            if cum_weights is None:
                if weights is None:
                    _int = int
                    total = len(population)
                    return [population[_int(random() * total)] for i in range(k)]
                cum_weights = list(_itertools.accumulate(weights))
            elif weights is not None:
                raise TypeError('Cannot specify both weights and cumulative weights')
            if len(cum_weights) != len(population):
                raise ValueError('The number of weights does not match the population')
            bisect = _bisect.bisect
            total = cum_weights[-1]
            return [population[bisect(cum_weights, random() * total)] for i in range(k)]
    
    >>> import random
    # weights being integers
    >>> random.choices(["white", "green", "red"], [12, 12, 4], k=10)
    ['green', 'red', 'green', 'white', 'white', 'white', 'green', 'white', 'red', 'white']
    # weights being floats
    >>> random.choices(["white", "green", "red"], [.12, .12, .04], k=10)
    ['white', 'white', 'green', 'green', 'red', 'red', 'white', 'green', 'white', 'green']
    # weights being fractions
    >>> random.choices(["white", "green", "red"], [12/100, 12/100, 4/100], k=10)
    ['green', 'green', 'white', 'red', 'green', 'red', 'white', 'green', 'green', 'green']
    
    >>> random.choices(["white", "green", "red"], k=10)
    ['white', 'white', 'green', 'red', 'red', 'red', 'white', 'white', 'white', 'green']
    
    import numpy as np
    
    n,k = 10**6,10**3
    
    # Create dummy distribution
    a = np.array([i+1 for i in range(n)])
    p = np.array([1.0/n]*n)
    
    cfd = p.cumsum()
    for _ in range(k):
        x = np.random.uniform()
        idx = cfd.searchsorted(x, side='right')
        sampled_element = a[idx]
    
    def rand_weighted(weights):
        """
        Generator which uses the weights to generate a
        weighted random values
        """
        sum_weights = sum(weights.values())
        cum_weights = {}
        current_weight = 0
        for key, value in sorted(weights.iteritems()):
            current_weight += value
            cum_weights[key] = current_weight
        while True:
            sel = int(random.uniform(0, 1) * sum_weights)
            for key, value in sorted(cum_weights.iteritems()):
                if sel < value:
                    break
            yield key
    
    def choice(items, weights):
        return items[np.argmin((np.cumsum(weights) / sum(weights)) < np.random.rand())]
    
    def weighted_choice(weighted_dict):
        """Input example: dict(apples=60, oranges=30, pineapples=10)"""
        weight_list = []
        for key in weighted_dict.keys():
            weight_list += [key] * weighted_dict[key]
        return random.choice(weight_list)
    
    import random, string
    from numpy import cumsum
    
    class randomChoiceWithProportions:
        '''
        Accepts a dictionary of choices as keys and weights as values. Example if you want a unfair dice:
    
    
        choiceWeightDic = {"1":0.16666666666666666, "2": 0.16666666666666666, "3": 0.16666666666666666
        , "4": 0.16666666666666666, "5": .06666666666666666, "6": 0.26666666666666666}
        dice = randomChoiceWithProportions(choiceWeightDic)
    
        samples = []
        for i in range(100000):
            samples.append(dice.sample())
    
        # Should be close to .26666
        samples.count("6")/len(samples)
    
        # Should be close to .16666
        samples.count("1")/len(samples)
        '''
        def __init__(self, choiceWeightDic):
            self.choiceWeightDic = choiceWeightDic
            weightSum = sum(self.choiceWeightDic.values())
            assert weightSum == 1, 'Weights sum to ' + str(weightSum) + ', not 1.'
            self.valWeightDict = self._compute_valWeights()
    
        def _compute_valWeights(self):
            valWeights = list(cumsum(list(self.choiceWeightDic.values())))
            valWeightDict = dict(zip(list(self.choiceWeightDic.keys()), valWeights))
            return valWeightDict
    
        def sample(self):
            num = random.uniform(0,1)
            for key, val in self.valWeightDict.items():
                if val >= num:
                    return key
    
    import random
    
    options = ['a', 'b', 'c', 'd']
    weights = [1, 2, 5, 2]
    
    weighted_options = [[opt]*wgt for opt, wgt in zip(options, weights)]
    weighted_options = [opt for sublist in weighted_options for opt in sublist]
    print(weighted_options)
    
    # test
    
    counts = {c: 0 for c in options}
    for x in range(10000):
        counts[random.choice(weighted_options)] += 1
    
    for opt, wgt in zip(options, weights):
        wgt_r = counts[opt] / 10000 * sum(weights)
        print(opt, counts[opt], wgt, wgt_r)
    
    ['a', 'b', 'b', 'c', 'c', 'c', 'c', 'c', 'd', 'd']
    a 1025 1 1.025
    b 1948 2 1.948
    c 5019 5 5.019
    d 2008 2 2.008
    
    import numpy as np
    weights = [0.1, 0.3, 0.5] #weights for the item at index 0,1,2
    # sum of weights should be <=1, you can also divide each weight by sum of all weights to standardise it to <=1 constraint.
    trials = 1 #number of trials
    num_item = 1 #number of items that can be picked in each trial
    selected_item_arr = np.random.multinomial(num_item, weights, trials)
    # gives number of times an item was selected at a particular index
    # this assumes selection with replacement
    # one possible output
    # selected_item_arr
    # array([[0, 0, 1]])
    # say if trials = 5, the the possible output could be 
    # selected_item_arr
    # array([[1, 0, 0],
    #   [0, 0, 1],
    #   [0, 0, 1],
    #   [0, 1, 0],
    #   [0, 0, 1]])
    
    num_item = 3
    trials = 1
    selected_item_arr = np.random.multinomial(num_item, weights, trials)
    # selected_item_arr can give output like :
    # array([[1, 0, 2]])
    
    num_binomial_trial = 5
    weights = [0.1,0.9] #say an unfair coin weights for H/T
    num_experiment_set = 1
    selected_item_arr = np.random.multinomial(num_binomial_trial, weights, num_experiment_set)
    # possible output
    # selected_item_arr
    # array([[1, 4]])
    # i.e H came 1 time and T came 4 times in 5 binomial trials. And one set contains 5 binomial trails.
    
    while w[index] < beta:
        beta = beta - w[index]
        index = index + 1
    
    select p[index]
    
    import itertools, bisect, random
    
    def weighted_choice(choices):
       weights = list(zip(*choices))[1]
       return choices[bisect.bisect(list(itertools.accumulate(weights)),
                                    random.uniform(0, sum(weights)))][0]
    
    np.random.choice(['A', 'B', 'C'], p=[0.3, 0.4, 0.3])