Algorithm 为什么这个简单的洗牌算法会产生有偏差的结果？一个简单的原因是什么？_Algorithm_Math_Shuffle

Algorithm 为什么这个简单的洗牌算法会产生有偏差的结果？一个简单的原因是什么？

algorithm math

Algorithm 为什么这个简单的洗牌算法会产生有偏差的结果？一个简单的原因是什么？,algorithm,math,shuffle,Algorithm,Math,Shuffle,这个简单的洗牌算法似乎会产生有偏差的结果： # suppose $arr is filled with 1 to 52 for ($i < 0; $i < 52; $i++) { $j = rand(0, 51); # swap the items $tmp = $arr[j]; $arr[j] = $arr[i]; $arr[i] = $tmp; } #假设$arr中填充了1到52 对于（$i

这个简单的洗牌算法似乎会产生有偏差的结果：

# suppose $arr is filled with 1 to 52

for ($i < 0; $i < 52; $i++) { 
  $j = rand(0, 51);

  # swap the items

  $tmp = $arr[j];
  $arr[j] = $arr[i];
  $arr[i] = $tmp;
}

#假设$arr中填充了1到52
对于（$i<0；$i<52；$i++）{
$j=兰特（0,51）；
#交换物品
$tmp=$arr[j]；
$arr[j]＝$arr[i]；
$arr[i]＝$tmp；
}

你可以试试。。。不要使用52，而是使用3（假设只使用了3张卡），然后运行10000次并汇总结果，您将看到结果向某些模式倾斜

问题是。。。什么是它将发生的简单解释

正确的解决方法是使用类似

for ($i < 0; $i < 51; $i++) {  # last card need not swap 
  $j = rand($i, 51);        # don't touch the cards that already "settled"

  # swap the items

  $tmp = $arr[j];
  $arr[j] = $arr[i];
  $arr[i] = $tmp;
}

for（$i<0；$i<51；$i++）{最后一张卡不需要交换
$j=rand（$i，51）#不要碰已经“解决”的牌
#交换物品
$tmp=$arr[j]；
$arr[j]＝$arr[i]；
$arr[i]＝$tmp；
}

但问题是。。。为什么第一种方法，看起来也是完全随机的，会使结果有偏差

更新1:感谢这里的人指出，它需要兰特（$i，51）才能正确地洗牌。

请看这个：

让我们以三张牌为例。使用3张牌组，洗牌后的牌组只有6张可能的订单：

123132213 231312321。

根据

rand（）

函数在不同点的结果，在第1个算法中，代码有27个可能的路径（结果）。这些结果中的每一个都具有相同的可能性（无偏见）。这些结果中的每一个都将映射到上面6个可能的“真实”洗牌结果列表中的同一个结果。我们现在有27件物品和6个桶可以放进去。因为27不能被6平均整除，所以这6个组合中的某些组合必须被过度表示

在第二种算法中，有6种可能的结果与6种可能的“真实”洗牌结果精确对应，并且随着时间的推移，它们都应该被平均表示

这一点很重要，因为在第一个算法中过度表示的桶不是随机的。为偏差选择的桶是可重复和可预测的。因此，如果你正在构建一个在线扑克游戏，并使用第1种算法，黑客可能会发现你使用了朴素排序，并从中发现某些牌组安排比其他牌组安排更可能发生。然后他们可以据此下注。他们会损失一些，但他们会赢的比输的多得多，很快就会让你破产。

我看到的关于这种影响的最好解释是杰夫·阿特伍德在他的Codinghoror博客（）上说的

使用此代码来模拟3张牌的随机洗牌

for (int i = 0; i < cards.Length; i++)
{
    int n = rand.Next(cards.Length);
    Swap(ref cards[i], ref cards[n]);
}

for（int i=0；i


…你得到这个分布

洗牌码（如上）产生3^3（27）种可能的牌组组合。但是数学告诉我们实际上只有3个！或3张牌组的6种可能组合。因此，一些组合被过度表示
您需要使用a正确（随机）洗牌一副牌。
请参阅《编写恐怖帖子》
基本上（支持3张卡）：
天真的洗牌结果是33（27）
可能的甲板组合。那是
奇怪，因为数学告诉我们
真的只有3个！或6
3卡的可能组合
甲板。在KFY洗牌中，我们开始
对于初始订单，从
三者中任何一个的第三位
卡，然后再次从第二个交换
用剩下的两张卡定位
这是这些替换的完整概率树
让我们假设您从序列123开始，然后我们将列举使用所讨论的代码生成随机结果的所有方法
123
 +- 123          - swap 1 and 1 (these are positions,
 |   +- 213      - swap 2 and 1  not numbers)
 |   |   +- 312  - swap 3 and 1
 |   |   +- 231  - swap 3 and 2
 |   |   +- 213  - swap 3 and 3
 |   +- 123      - swap 2 and 2
 |   |   +- 321  - swap 3 and 1
 |   |   +- 132  - swap 3 and 2
 |   |   +- 123  - swap 3 and 3
 |   +- 132      - swap 2 and 3
 |       +- 231  - swap 3 and 1
 |       +- 123  - swap 3 and 2
 |       +- 132  - swap 3 and 3
 +- 213          - swap 1 and 2
 |   +- 123      - swap 2 and 1
 |   |   +- 321  - swap 3 and 1
 |   |   +- 132  - swap 3 and 2
 |   |   +- 123  - swap 3 and 3
 |   +- 213      - swap 2 and 2
 |   |   +- 312  - swap 3 and 1
 |   |   +- 231  - swap 3 and 2
 |   |   +- 213  - swap 3 and 3
 |   +- 231      - swap 2 and 3
 |       +- 132  - swap 3 and 1
 |       +- 213  - swap 3 and 2
 |       +- 231  - swap 3 and 3
 +- 321          - swap 1 and 3
     +- 231      - swap 2 and 1
     |   +- 132  - swap 3 and 1
     |   +- 213  - swap 3 and 2
     |   +- 231  - swap 3 and 3
     +- 321      - swap 2 and 2
     |   +- 123  - swap 3 and 1
     |   +- 312  - swap 3 and 2
     |   +- 321  - swap 3 and 3
     +- 312      - swap 2 and 3
         +- 213  - swap 3 and 1
         +- 321  - swap 3 and 2
         +- 312  - swap 3 and 3

现在，互换信息之前的第四列数字包含了最终结果，其中有27种可能的结果
让我们计算每个模式出现的次数：
123 - 4 times
132 - 5 times
213 - 5 times
231 - 5 times
312 - 4 times
321 - 4 times
=============
     27 times total

如果您将随机交换的代码运行无限次，那么模式132、213和231将比模式123、312和321更频繁地出现，这仅仅是因为代码交换的方式更容易发生
现在，当然，你可以说，如果你运行代码30次（27+3），你可能会得到所有模式出现5次的结果，但是在处理统计数据时，你必须着眼于长期趋势
下面是探索每种可能模式的随机性的C代码：
class Program
{
    static void Main(string[] args)
    {
        Dictionary<String, Int32> occurances = new Dictionary<String, Int32>
        {
            { "123", 0 },
            { "132", 0 },
            { "213", 0 },
            { "231", 0 },
            { "312", 0 },
            { "321", 0 }
        };

        Char[] digits = new[] { '1', '2', '3' };
        Func<Char[], Int32, Int32, Char[]> swap = delegate(Char[] input, Int32 pos1, Int32 pos2)
        {
            Char[] result = new Char[] { input[0], input[1], input[2] };
            Char temp = result[pos1];
            result[pos1] = result[pos2];
            result[pos2] = temp;
            return result;
        };

        for (Int32 index1 = 0; index1 < 3; index1++)
        {
            Char[] level1 = swap(digits, 0, index1);
            for (Int32 index2 = 0; index2 < 3; index2++)
            {
                Char[] level2 = swap(level1, 1, index2);
                for (Int32 index3 = 0; index3 < 3; index3++)
                {
                    Char[] level3 = swap(level2, 2, index3);
                    String output = new String(level3);
                    occurances[output]++;
                }
            }
        }

        foreach (var kvp in occurances)
        {
            Console.Out.WriteLine(kvp.Key + ": " + kvp.Value);
        }
    }
}

所以，虽然这个答案确实有意义，但它不是一个纯粹的数学答案，你只需要评估随机函数的所有可能方式，看看最后的输出。
这里有另一个直觉：单次洗牌交换不能在占据位置的概率中创建对称，除非至少已经存在双向对称。调用三个位置A、B和C。现在，假设A是卡2在位置A的概率，B是卡2在位置B的概率，C是在交换移动之前卡2在位置C的概率。假设没有两个概率是相同的：a=b、 b=c、 c=A.现在计算交换后卡处于这三个位置的概率a'，b'，和c'。假设这个交换动作由位置C与三个位置中的一个随机交换组成。然后：
a' = a*2/3 + c*1/3
b' = b*2/3 + c*1/3
c' = 1/3.

也就是说，卡在位置A结束的概率是它已经存在的概率乘以位置A不参与交换的时间的2/3，再加上位置C的概率乘以位置C与位置A交换的1/3概率，等等。现在减去前两个等式，我们得到：
a' - b' = (a - b)*2/3

这意味着因为我们假设了一个=b、 然后一个'=b'（尽管随着时间的推移，如果交换足够多，差额将接近0）。但是既然a'+b'+c'=1，如果a'=b'，那么两者都不能是e
a' - b' = (a - b)*2/3

Pr(Item i ends up in slot j) = 1/N?

Pr(item i ends up at slot j | item i was not chosen in the first j-1 draws)
* Pr(item i was not chosen in the first j-1 draws).

(N-1 / N) * (N-2 / N-1) * ... * (N-j / N-j+1)

[(N-1 / N) * (N-2 / N-1) * ... * (N-j / N-j+1)] * (1 / N-j)
= 1/N