Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/ruby/22.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/algorithm/10.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Ruby 基于不同权重的随机洗牌算法_Ruby_Algorithm_Sorting - Fatal编程技术网

Ruby 基于不同权重的随机洗牌算法

Ruby 基于不同权重的随机洗牌算法,ruby,algorithm,sorting,Ruby,Algorithm,Sorting,我有一个我想随机洗牌的元素集合,但每个元素都有不同的优先级或权重。因此权重较大的元素必须有更多的概率才能位于结果的顶部 我有这个阵列: elements = [ { :id => "ID_1", :weight => 1 }, { :id => "ID_2", :weight => 2 }, { :id => "ID_3", :weight => 6 } ] 我想洗牌它,这样id为“id

我有一个我想随机洗牌的元素集合,但每个元素都有不同的优先级或权重。因此权重较大的元素必须有更多的概率才能位于结果的顶部

我有这个阵列:

elements = [
  { :id => "ID_1", :weight => 1 },
  { :id => "ID_2", :weight => 2 },
  { :id => "ID_3", :weight => 6 }
]
我想洗牌它,这样id为
“id\u 3”
的元素比元素
“id\u 1”
的概率高6倍,比元素
“id\u 2”
的概率高3倍

更新
澄清:一旦您选择了第一个位置,其他元素将使用相同的逻辑争夺其余位置。

我有我的解决方案,但我认为可以改进:

module Utils
  def self.random_suffle_with_weight(elements, &proc)
    # Create a consecutive chain of element
    # on which every element is represented
    # as many times as its weight.
    consecutive_chain = []
    elements.each do |element|
      proc.call(element).times { consecutive_chain << element }
    end

    # Choosine one element randomly from
    # the consecutive_chain and remove it for the next round
    # until all elements has been chosen.
    shorted_elements = []
    while(shorted_elements.length < elements.length)
      random_index = Kernel.rand(consecutive_chain.length)
      selected_element = consecutive_chain[random_index]
      shorted_elements << selected_element
      consecutive_chain.delete(selected_element)
    end

    shorted_elements
  end
end
根据:

def self.random_-suffle___-weight_(元素和过程)
连续_链=[]
元素。每个do |元素|

proc.call(element).times{continuoused_chain我可以想出两种方法来解决它,尽管我的直觉告诉我应该进行修改以更好地实现它:

O(n*W)解决方案:(易于编程)

第一种方法,根据权重创建重复项(与您的方法相同),并填充一个新列表。现在在此列表上运行标准洗牌(fisher-yates)。迭代列表并丢弃所有重复项,只保留每个元素的第一次出现。这在
O(n*W)中运行
,其中
n
是列表中的元素数,
W
是平均权重(伪多项式解)


O(nlogn)解决方案:(编程难度大得多)

第二种方法是创建元素权重总和的列表:

sum[i] = weight[0] + ... + weight[i]
现在,在
0
sum[n]
之间画一个数字,并选择
sum
大于/等于此随机数的第一个元素。
这将是下一个元素,丢弃该元素,重新创建列表,然后重复

这在
O(n^2*logn)

它可以通过创建二叉树而不是列表来进一步增强,在列表中,每个节点还存储整个子树的权重值。
现在,在选择一个元素后,找到匹配的元素(其总和比随机选择的数字高出第一个),删除节点,并重新计算路由路径上的权重。
这将需要
O(n)
来创建树,
O(logn)
来在每一步找到节点,并且
O(logn)
来重新计算总和。重复它,直到树用尽为止,您将得到
O(nlogn)
解决方案。
这种方法的思想非常类似于,但使用的是权重之和而不是后代的数量。删除后的搜索和平衡将类似于顺序统计树


二叉树的构造和使用说明

假设您有
元素=[a,b,c,d,e,f,g,h,i,j,k,l,m]
权重=[1,2,3,1,2,3,1,2,1,2,3,1]

首先构造一个几乎完整的二叉树,并填充其中的元素。请注意,该树不是二叉搜索树,只是一个普通树,因此元素的顺序并不重要,我们以后也不需要维护它

您将获得类似以下树的内容:

图例:w-该节点的权重,sw-整个子树的权重之和

接下来,计算每个子树的权重之和。从叶子开始,计算
s.w=w
。对于其他每个节点,计算
s.w=left->s.w+right->s.w
,从下往上填充树()

O(n)
中为每个节点构建树、填充树并计算
s.w.

现在,您需要迭代地选择一个介于0和权重之和之间的随机数(根的s.w.值,在本例中为25)。让该数字为
r
,并为每个这样的数字找到匹配的节点。
查找匹配节点是递归完成的

if `r< root.left.sw`:
   go to left son, and repeat. 
else if `r<root.left.sw + root.w`:
   the node you are seeking is the root, choose it. 
else:
   go to `root.right` with `r= r-root.left.sw - root.w`
这是在每次迭代的
O(h)=O(logn)
中完成的

现在,您需要删除该节点,并重置树的权重。
一种确保树具有对数权重的删除方法与二进制堆非常相似:用最右下的节点替换所选节点,删除最右下的新节点,并重新平衡从两个相关节点到树的两个分支

第一个开关:

然后重新计算:

请注意,只需要对两条路径进行重新计算,每条路径的最大深度为
O(logn)
(图中的节点颜色为橙色),因此删除和重新计算也是
O(logn)


现在,您得到了一个新的二叉树,带有修改后的权重,您可以选择下一个候选树,直到树用尽为止。

我将按如下方式洗牌数组:

代码

def weighted_shuffle(array)
  arr = array.sort_by { |h| -h[:weight] }
  tot_wt = arr.reduce(0) { |t,h| t += h[:weight] }
  ndx_left = arr.each_index.to_a
  arr.size.times.with_object([]) do |_,a|
    cum = 0
    rn = (tot_wt>0) ? rand(tot_wt) : 0
    ndx = ndx_left.find { |i| rn <= (cum += arr[i][:weight]) }
    a << arr[ndx]
    tot_wt -= arr[ndx_left.delete(ndx)][:weight]
  end
end
也就是说,在调用的10000次
加权洗牌
中,选择的第一个元素是66.1%的“ID_3”,22.4%的时间是“ID_2”,其余11.5%的时间是“ID_1”,47.2%的时间是第二个选择“ID_2”,以此类推

解释

arr
是要洗牌的哈希数组。洗牌按
arr.size
步骤执行。在每个步骤中,我使用提供的权重随机抽取一个
arr
元素,无需替换。If
h[:weight]
总和为
tot
对于之前未选择的
arr
的所有元素
h
,这些散列
h
中任何一个被选择的概率为
h[:weight]/tot
。每一步的选择都是通过找到第一个累积概率
p
,其中
rand(tot)为此提供了一个非常优雅的算法。实现非常简单,并在
O(n log(n))
中运行:

不知道你为什么会这样
sum[i] = weight[0] + ... + weight[i]
if `r< root.left.sw`:
   go to left son, and repeat. 
else if `r<root.left.sw + root.w`:
   the node you are seeking is the root, choose it. 
else:
   go to `root.right` with `r= r-root.left.sw - root.w`
Is r<root.left.sw? Yes. Recursively invoke with r=10,root=B (left child)
Is r<root.left.sw No. Is r < root.left.sw + root.w? No. Recursively invoke with r=10-6-2=2, and root=E (right chile)
Is r<root.left.sw? No. Is r < root.left.sw + root.w? Yes. Choose E as next node.
def weighted_shuffle(array)
  arr = array.sort_by { |h| -h[:weight] }
  tot_wt = arr.reduce(0) { |t,h| t += h[:weight] }
  ndx_left = arr.each_index.to_a
  arr.size.times.with_object([]) do |_,a|
    cum = 0
    rn = (tot_wt>0) ? rand(tot_wt) : 0
    ndx = ndx_left.find { |i| rn <= (cum += arr[i][:weight]) }
    a << arr[ndx]
    tot_wt -= arr[ndx_left.delete(ndx)][:weight]
  end
end
elements = [
  { :id => "ID_1", :weight => 100 },
  { :id => "ID_2", :weight => 200 },
  { :id => "ID_3", :weight => 600 }
]

def display(arr,n)
  n.times.with_object([]) { |_,a|
    p weighted_shuffle(arr).map { |h| h[:id] } }
end

display(elements,10)
  ["ID_3", "ID_2", "ID_1"]
  ["ID_1", "ID_3", "ID_2"]
  ["ID_1", "ID_3", "ID_2"]
  ["ID_3", "ID_2", "ID_1"]
  ["ID_3", "ID_2", "ID_1"]
  ["ID_2", "ID_3", "ID_1"]
  ["ID_2", "ID_3", "ID_1"]
  ["ID_3", "ID_1", "ID_2"]
  ["ID_3", "ID_1", "ID_2"]
  ["ID_3", "ID_2", "ID_1"]

n = 10_000
pos = elements.each_index.with_object({}) { |i,pos| pos[i] = Hash.new(0) }
n.times { weighted_shuffle(elements).each_with_index { |h,i|
  pos[i][h[:id]] += 1 } }
pos.each { |_,h| h.each_key { |k| h[k] = (h[k]/n.to_f).round(3) } }
  #=> {0=>{"ID_3"=>0.661, "ID_2"=>0.224, "ID_1"=>0.115},
  #    1=>{"ID_2"=>0.472, "ID_3"=>0.278, "ID_1"=>0.251},
  #    2=>{"ID_1"=>0.635, "ID_2"=>0.304, "ID_3"=>0.061}}
elements.sort_by { |h| -h[:weight] }
  #=> [{ :id => "ID_3", :weight => 600 },
  #    { :id => "ID_2", :weight => 200 },
  #    { :id => "ID_1", :weight => 100 }]
def weighted_shuffle_variant(array)
   arr = array.sort_by { |h| -h[:weight] }
   tot_wt = arr.reduce(0) { |t,h| t += h[:weight] }
   n = arr.size
   n.times.with_object([]) do |_,a|
     cum = 0
     rn = (tot_wt>0) ? rand(tot_wt) : 0
     h, ndx = arr.each_with_index.find { |h,_| rn <= (cum += h[:weight]) }
     a << h
     tot_wt -= h[:weight]
     arr[ndx] = arr.pop
   end
 end
arr[i] = arr.pop 
def mori_shuffle(array)
  array.flat_map { |h| [h[:id]] * h[:weight] }.shuffle.uniq
end

require 'benchmark'

def test_em(nelements, ndigits)
  puts "\nelements.size=>#{nelements}, weights have #{ndigits} digits\n\n"
  mx = 10**ndigits
  elements = nelements.times.map { |i| { id: i, weight: rand(mx) } }
  Benchmark.bm(15 "mori_shuffle", "weighted_shuffle") do |x|
    x.report { mori_shuffle(elements) }
    x.report { weighted_shuffle(elements) }
  end
end
elements.size=>3, weights have 1 digits

                      user     system      total        real
mori_shuffle      0.000000   0.000000   0.000000 (  0.000068)
weighted_shuffle  0.000000   0.000000   0.000000 (  0.000051)

elements.size=>3, weights have 2 digits

                      user     system      total        real
mori_shuffle      0.000000   0.000000   0.000000 (  0.000035)
weighted_shuffle  0.010000   0.000000   0.010000 (  0.000026)

elements.size=>3, weights have 3 digits

                      user     system      total        real
mori_shuffle      0.000000   0.000000   0.000000 (  0.000161)
weighted_shuffle  0.000000   0.000000   0.000000 (  0.000027)

elements.size=>3, weights have 4 digits

                      user     system      total        real
mori_shuffle      0.000000   0.000000   0.000000 (  0.000854)
weighted_shuffle  0.000000   0.000000   0.000000 (  0.000026)
elements.size=>20, weights have 2 digits

                      user     system      total        real
mori_shuffle      0.000000   0.000000   0.000000 (  0.000089)
weighted_shuffle  0.000000   0.000000   0.000000 (  0.000090)

elements.size=>20, weights have 3 digits

                      user     system      total        real
mori_shuffle      0.000000   0.000000   0.000000 (  0.000771)
weighted_shuffle  0.000000   0.000000   0.000000 (  0.000071)

elements.size=>20, weights have 4 digits

                      user     system      total        real
mori_shuffle      0.000000   0.000000   0.000000 (  0.005895)
weighted_shuffle  0.000000   0.000000   0.000000 (  0.000073)
elements.size=>100, weights have 2 digits

                      user     system      total        real
mori_shuffle      0.000000   0.000000   0.000000 (  0.000446)
weighted_shuffle  0.000000   0.000000   0.000000 (  0.000683)

elements.size=>100, weights have 3 digits

                      user     system      total        real
mori_shuffle      0.010000   0.000000   0.010000 (  0.003765)
weighted_shuffle  0.000000   0.000000   0.000000 (  0.000659)

elements.size=>100, weights have 4 digits

                      user     system      total        real
mori_shuffle      0.030000   0.010000   0.040000 (  0.034982)
weighted_shuffle  0.000000   0.000000   0.000000 (  0.000638)

elements.size=>100, weights have 5 digits

                      user     system      total        real
mori_shuffle      0.550000   0.040000   0.590000 (  0.593190)
weighted_shuffle  0.000000   0.000000   0.000000 (  0.000623)

elements.size=>100, weights have 6 digits

                      user     system      total        real
mori_shuffle      5.560000   0.380000   5.940000 (  5.944749)
weighted_shuffle  0.010000   0.000000   0.010000 (  0.000636)
elements.size=>20, weights have 3 digits

                               user     system      total        real
weighted_shuffle           0.000000   0.000000   0.000000 (  0.000062)
weighted_shuffle_variant   0.000000   0.000000   0.000000 (  0.000108)

elements.size=>20, weights have 4 digits

                               user     system      total        real
weighted_shuffle           0.000000   0.000000   0.000000 (  0.000060)
weighted_shuffle_variant   0.000000   0.000000   0.000000 (  0.000089)

elements.size=>100, weights have 2 digits

                               user     system      total        real
weighted_shuffle           0.000000   0.000000   0.000000 (  0.000666)
weighted_shuffle_variant   0.000000   0.000000   0.000000 (  0.000871)

elements.size=>100, weights have 4 digits

                               user     system      total        real
weighted_shuffle           0.000000   0.000000   0.000000 (  0.000625)
weighted_shuffle_variant   0.000000   0.000000   0.000000 (  0.000803)

elements.size=>100, weights have 6 digits

                               user     system      total        real
weighted_shuffle           0.000000   0.000000   0.000000 (  0.000664)
weighted_shuffle_variant   0.000000   0.000000   0.000000 (  0.000773)
def weigthed_shuffle(items, weights):
    order = sorted(range(len(items)), key=lambda i: -random.random() ** (1.0 / weights[i]))
    return [items[i] for i in order]