Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/jsf-2/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 根据累积贝塔值的权重,累积贝塔值是一个统一随机构造的累积值,并增加当前索引以查找与贝塔值匹配的项。_Python_Module_Random - Fatal编程技术网

Python 根据累积贝塔值的权重,累积贝塔值是一个统一随机构造的累积值,并增加当前索引以查找与贝塔值匹配的项。

Python 根据累积贝塔值的权重,累积贝塔值是一个统一随机构造的累积值,并增加当前索引以查找与贝塔值匹配的项。,python,module,random,Python,Module,Random,基于其他解决方案,您可以生成累积分布(以整数或浮点形式,任您选择),然后你可以用对分来加快速度 这是一个简单的例子(我在这里使用整数) get_cdf函数将其从20,60,10,10转换为20,20+60,20+60+10,20+60+10+10 现在,我们使用random.randint选择一个随机数,最大为20+60+10+10,然后使用bisect快速获取实际值。由于Python 3.6,在Python的标准库中有一个解决方案,即 示例用法:让我们设置一个与OP问题中的人口和权重匹配的人口

基于其他解决方案,您可以生成累积分布(以整数或浮点形式,任您选择),然后你可以用对分来加快速度

这是一个简单的例子(我在这里使用整数)

get_cdf
函数将其从20,60,10,10转换为20,20+60,20+60+10,20+60+10+10


现在,我们使用random.randint选择一个随机数,最大为20+60+10+10,然后使用bisect快速获取实际值。由于Python 3.6,在Python的标准库中有一个解决方案,即

示例用法:让我们设置一个与OP问题中的人口和权重匹配的人口和权重:

>>> from random import choices
>>> population = [1, 2, 3, 4, 5, 6]
>>> weights = [0.1, 0.05, 0.05, 0.2, 0.4, 0.2]
现在
选项(总体、权重)
生成一个样本:

>>> choices(population, weights)
4
可选的仅关键字参数
k
允许一次请求多个样本。这是很有价值的,因为在生成任何样本之前,
random需要做一些准备工作;通过一次生成多个样本,我们只需做一次准备工作。在这里,我们生成一百万个样本,并使用
collections.Counter
检查我们得到的分布是否与我们给出的权重大致匹配

>>> million_samples = choices(population, weights, k=10**6)
>>> from collections import Counter
>>> Counter(million_samples)
Counter({5: 399616, 6: 200387, 4: 200117, 1: 99636, 3: 50219, 2: 50025})

我编写了一个解决方案,用于从自定义连续分布中抽取随机样本

我需要这个用于与您类似的用例(即,生成具有给定概率分布的随机日期)

您只需要函数
random\u custDist
和行
samples=random\u custDist(x0,x1,custDist=custDist,size=1000)
。剩下的是装饰

import numpy as np

#funtion
def random_custDist(x0,x1,custDist,size=None, nControl=10**6):
    #genearte a list of size random samples, obeying the distribution custDist
    #suggests random samples between x0 and x1 and accepts the suggestion with probability custDist(x)
    #custDist noes not need to be normalized. Add this condition to increase performance. 
    #Best performance for max_{x in [x0,x1]} custDist(x) = 1
    samples=[]
    nLoop=0
    while len(samples)<size and nLoop<nControl:
        x=np.random.uniform(low=x0,high=x1)
        prop=custDist(x)
        assert prop>=0 and prop<=1
        if np.random.uniform(low=0,high=1) <=prop:
            samples += [x]
        nLoop+=1
    return samples

#call
x0=2007
x1=2019
def custDist(x):
    if x<2010:
        return .3
    else:
        return (np.exp(x-2008)-1)/(np.exp(2019-2007)-1)
samples=random_custDist(x0,x1,custDist=custDist,size=1000)
print(samples)

#plot
import matplotlib.pyplot as plt
#hist
bins=np.linspace(x0,x1,int(x1-x0+1))
hist=np.histogram(samples, bins )[0]
hist=hist/np.sum(hist)
plt.bar( (bins[:-1]+bins[1:])/2, hist, width=.96, label='sample distribution')
#dist
grid=np.linspace(x0,x1,100)
discCustDist=np.array([custDist(x) for x in grid]) #distrete version
discCustDist*=1/(grid[1]-grid[0])/np.sum(discCustDist)
plt.plot(grid,discCustDist,label='custom distribustion (custDist)', color='C1', linewidth=4)
#decoration
plt.legend(loc=3,bbox_to_anchor=(1,0))
plt.show()
将numpy导入为np
#功能
def random_custDist(x0,x1,custDist,size=None,nControl=10**6):
#生成一个大小随机样本的列表,遵循客户分布
#建议x0和x1之间的随机样本,并以概率custDist(x)接受建议
#客户拒绝不需要标准化。添加此条件以提高性能。
#[x0,x1]}custDist(x)=1中max_{x的最佳性能
样本=[]
nLoop=0


而len(示例)不是
random.choice()
?您可以使用适当的出现次数构建主列表,并选择一个。当然,这是一个重复的问题。@S.Lott的可能重复对于分布中的巨大差异来说不是非常占用内存吗?@S.Lott:您选择的方法可能适合少量出现,但我宁愿避免在不必要时创建大量列表。@S.Lott:好,大约10000*365=3650000=360万个元素。我不确定Python中的内存使用情况,但至少是3.6M*4B=14.4MB。数额不大,,但是,当有一个同样简单的方法不需要额外内存时,您也不应该忽略这一点。numpy函数似乎只支持有限数量的发行版,不支持指定您自己的。updated link,而不是如果项目列表很大,这可能会使用大量额外内存。@pafcu同意。只是一个解决方案,我想到的第二个解决方案(第一个是搜索类似“权重概率python”:)。列表中(item,prob)对的顺序在您的实现中很重要,对吧?@stackoverflowuser2010:这不重要(浮点中的模错误)很好。我发现这比scipy.stats.rv_离散快30%。很多时候,这个函数会抛出一个KeyError,因为最后一行。@duckenmaster:我不明白。您是否知道
l[-1]
返回列表的最后一个元素?OP不想使用
random.choice()
-请参阅注释。
numpy.random.choice()
random.choice()
完全不同,并且支持概率分布。我不能使用函数定义p吗?为什么我要用数字来定义它?这看起来令人印象深刻。以下是上述代码3次连续执行的结果:[“带prob:0.1的计数为:113”,“带prob:0.05的计数为:55”,“带prob:0.05的计数为:50”,“带prob:0.2的计数为:201”,“带prob:0.4的计数为:388”,“带prob:0.2的计数为:193”。。。。。。。。。。。。。。[“有问题的1的计数:0.1是:77”,“有问题的2的计数:0.05是:60”,“有问题的3的计数:0.05是:51”,“有问题的4的计数:0.2是:193”,“有问题的5的计数:0.4是:438”,“有问题的6的计数:0.2是:181”[“有问题的1的计数:0.1是84”,“有问题的2的计数:0.05是52”,“有问题的3的计数:0.05是53”,“有问题的4的计数:0.2是210”,“有问题的5的计数:0.4是405”,“有问题的6的计数:0.2是196']一个问题,我如何返回max(i…,如果“i”是一个对象?@Vaibhav
i
不是一个对象。在我的机器上
numpy.random.choice()
几乎快了20倍。它对原始问题做了完全相同的w.r.t.。例如:
numpy.random.choice(numpy.arange(1,7),p=[0.1,0.05,0.2,0.4,0.2])
@EugenePakhomov这很好,我不知道。我可以看到有一个答案进一步提到这一点,但它不包含任何示例代码,也没有太多的升级投票。为了更好的可见性,我将在这个答案中添加注释。令人惊讶的是,rv_discrete.rvs()在O(len(p)*size)时间和内存中工作!而choice()似乎在最佳O(len(p)+log(len(p))*大小下运行时间。如果您使用的是Python 3.6或更高版本,则不需要任何附加软件包。是否有Python 2.7版本?@abbas786:不是内置的,但此问题的其他答案都应适用于Python 2.7。您还可以查找Python 3源代码以获得随机选择,如果愿意的话,可以复制它。
发行版
列表需要按概率排序?不需要
distribution = [(1, 0.2), (2, 0.3), (3, 0.5)]  
# init distribution  
dlist = []  
sumchance = 0  
for value, chance in distribution:  
    sumchance += chance  
    dlist.append((value, sumchance))  
assert sumchance == 1.0 # not good assert because of float equality  

# get random value  
r = random.random()  
# for small distributions use lineair search  
if len(distribution) < 64: # don't know exact speed limit  
    for value, sumchance in dlist:  
        if r < sumchance:  
            return value  
else:  
    # else (not implemented) binary search algorithm  
def random_distr(l):
    r = random.uniform(0, 1)
    s = 0
    for item, prob in l:
        s += prob
        if s >= r:
            return item
    return item  # Might occur because of floating point inaccuracies
numpy.random.choice(numpy.arange(1, 7), p=[0.1, 0.05, 0.05, 0.2, 0.4, 0.2])
def accumulate_normalize_values(p):
        pi = p.items() if isinstance(p,dict) else p
        accum_pi = []
        accum = 0
        for i in pi:
                accum_pi.append((i[0],i[1]+accum))
                accum += i[1]
        if accum == 0:
                raise Exception( "You are about to explode the universe. Continue ? Y/N " )
        normed_a = []
        for a in accum_pi:
                normed_a.append((a[0],a[1]*1.0/accum))
        return normed_a
>>> accumulate_normalize_values( { 'a': 100, 'b' : 300, 'c' : 400, 'd' : 200  } )
[('a', 0.1), ('c', 0.5), ('b', 0.8), ('d', 1.0)]
def select(symbol_intervals,random):
        print symbol_intervals,random
        i = 0
        while random > symbol_intervals[i][1]:
                i += 1
                if i >= len(symbol_intervals):
                        raise Exception( "What did you DO to that poor list?" )
        return symbol_intervals[i][0]


def gen_random(alphabet,length,probabilities=None):
        from random import random
        from itertools import repeat
        if probabilities is None:
                probabilities = dict(zip(alphabet,repeat(1.0)))
        elif len(probabilities) > 0 and isinstance(probabilities[0],(int,long,float)):
                probabilities = dict(zip(alphabet,probabilities)) #ordered
        usable_probabilities = accumulate_normalize_values(probabilities)
        gen = []
        while len(gen) < length:
                gen.append(select(usable_probabilities,random()))
        return gen
>>> gen_random (['a','b','c','d'],10,[100,300,400,200])
['d', 'b', 'b', 'a', 'c', 'c', 'b', 'c', 'c', 'c']   #<--- some of the time
val = numpy.random.choice(numpy.arange(1, 7), p=[0.1, 0.05, 0.05, 0.2, 0.4, 0.2])
from __future__ import division
import random
from collections import Counter


def num_gen(num_probs):
    # calculate minimum probability to normalize
    min_prob = min(prob for num, prob in num_probs)
    lst = []
    for num, prob in num_probs:
        # keep appending num to lst, proportional to its probability in the distribution
        for _ in range(int(prob/min_prob)):
            lst.append(num)
    # all elems in lst occur proportional to their distribution probablities
    while True:
        # pick a random index from lst
        ind = random.randint(0, len(lst)-1)
        yield lst[ind]
gen = num_gen([(1, 0.1),
               (2, 0.05),
               (3, 0.05),
               (4, 0.2),
               (5, 0.4),
               (6, 0.2)])
lst = []
times = 10000
for _ in range(times):
    lst.append(next(gen))
# Verify the created distribution:
for item, count in Counter(lst).iteritems():
    print '%d has %f probability' % (item, count/times)

1 has 0.099737 probability
2 has 0.050022 probability
3 has 0.049996 probability 
4 has 0.200154 probability
5 has 0.399791 probability
6 has 0.200300 probability
def resample(weights, n):
    beta = 0

    # Caveat: Assign max weight to max*2 for best results
    max_w = max(weights)*2

    # Pick an item uniformly at random, to start with
    current_item = random.randint(0,n-1)
    result = []

    for i in range(n):
        beta += random.uniform(0,max_w)

        while weights[current_item] < beta:
            beta -= weights[current_item]
            current_item = (current_item + 1) % n   # cyclic
        else:
            result.append(current_item)
    return result
l=[(20, 'foo'), (60, 'banana'), (10, 'monkey'), (10, 'monkey2')]
def get_cdf(l):
    ret=[]
    c=0
    for i in l: c+=i[0]; ret.append((c, i[1]))
    return ret

def get_random_item(cdf):
    return cdf[bisect.bisect_left(cdf, (random.randint(0, cdf[-1][0]),))][1]

cdf=get_cdf(l)
for i in range(100): print get_random_item(cdf),
>>> from random import choices
>>> population = [1, 2, 3, 4, 5, 6]
>>> weights = [0.1, 0.05, 0.05, 0.2, 0.4, 0.2]
>>> choices(population, weights)
4
>>> million_samples = choices(population, weights, k=10**6)
>>> from collections import Counter
>>> Counter(million_samples)
Counter({5: 399616, 6: 200387, 4: 200117, 1: 99636, 3: 50219, 2: 50025})
import numpy as np

#funtion
def random_custDist(x0,x1,custDist,size=None, nControl=10**6):
    #genearte a list of size random samples, obeying the distribution custDist
    #suggests random samples between x0 and x1 and accepts the suggestion with probability custDist(x)
    #custDist noes not need to be normalized. Add this condition to increase performance. 
    #Best performance for max_{x in [x0,x1]} custDist(x) = 1
    samples=[]
    nLoop=0
    while len(samples)<size and nLoop<nControl:
        x=np.random.uniform(low=x0,high=x1)
        prop=custDist(x)
        assert prop>=0 and prop<=1
        if np.random.uniform(low=0,high=1) <=prop:
            samples += [x]
        nLoop+=1
    return samples

#call
x0=2007
x1=2019
def custDist(x):
    if x<2010:
        return .3
    else:
        return (np.exp(x-2008)-1)/(np.exp(2019-2007)-1)
samples=random_custDist(x0,x1,custDist=custDist,size=1000)
print(samples)

#plot
import matplotlib.pyplot as plt
#hist
bins=np.linspace(x0,x1,int(x1-x0+1))
hist=np.histogram(samples, bins )[0]
hist=hist/np.sum(hist)
plt.bar( (bins[:-1]+bins[1:])/2, hist, width=.96, label='sample distribution')
#dist
grid=np.linspace(x0,x1,100)
discCustDist=np.array([custDist(x) for x in grid]) #distrete version
discCustDist*=1/(grid[1]-grid[0])/np.sum(discCustDist)
plt.plot(grid,discCustDist,label='custom distribustion (custDist)', color='C1', linewidth=4)
#decoration
plt.legend(loc=3,bbox_to_anchor=(1,0))
plt.show()