C 从6面模具实现改进7面模具辊模拟的性能_C_Performance_Algorithm_Random

C 从6面模具实现改进7面模具辊模拟的性能

c performance algorithm random

C 从6面模具实现改进7面模具辊模拟的性能,c,performance,algorithm,random,C,Performance,Algorithm,Random,在另一种情况下，以下算法（基于算术编码）是一种生成7面模具结果的有效方法，当给出的是6面模具时： int rand7() { static double a=0, width=7; // persistent state while ((int)(a+width) != (int)a) { width /= 6; a += (rand6()-1)*width; } int n = (int)a; a -= n; a *= 7; width *=

在另一种情况下，以下算法（基于算术编码）是一种生成7面模具结果的有效方法，当给出的是6面模具时：

int rand7()
{
  static double a=0, width=7;  // persistent state

  while ((int)(a+width) != (int)a)
  {
    width /= 6;
    a += (rand6()-1)*width;
  }

  int n = (int)a;
  a -= n; 
  a *= 7; width *= 7;
  return (n+1);
}

作为一名真正的数学家，我将尽我所能解释该算法的工作原理：

在每次调用

rand7（）

时，

width

是7s/6t的比率，

是一个非负值，其属性是

a+width

位于间隔内[0，7）在基本情况之后。当进入

while

循环时，

宽度

是可以添加到

的最大值。如果

地板（a+宽度）

与

地板（a）

不同，则随机选择{0，

width

*1/6，

width

*1/3，

width

*1/2，

width

*2/3，

width

*5/6}被添加到

，指数

增加1（将

width

的值减少6次方）。请注意，迭代后，

a+width

位于区间[0,7]的属性保持不变。当

width

小于

ceil（a）的差值时-a

，迭代停止。循环向

添加更多的熵，只要这样做实际上会影响压模辊的结果，直观地说，这是使用基数6在[0,7]范围内建立一个随机实数。离开循环后，压模辊被视为

地板（a）+1

，而

被缩减为其小数部分。此时

a+宽度

位于区间[0,1]中。为了准备下一次调用并保持不变属性，将

和

width

按比例放大7倍（对于

width

，这将指数

增加1）

上面解释了归纳步骤的工作原理。基本情况的分析留给感兴趣的读者作为练习

当然，从效率的角度来看，浮点运算的使用会立即成为一种性能拖累（假设

rand6（）

的性能已经足够，而且本身无法改进）。在维护此算术编码算法的同时，消除浮点使用的最佳方法是什么？

[编辑]

需要改进下面的方法，但下面是一个简单的无偏方法。它调用

rand6（）

至少两次，效率低下。（假设

rand6（）

是无偏的）

对于每6次调用

rand7（）

，

rand6（）

应该调用7次。初始化宽整数状态以最小化偏差

需要稍后进行测试。GTG

int rand7(void) {
  static int count = -1;
  static unsigned long long state = 0;
  if (count < 0) {
    count = 0;
    for (int i=0; i<25; i++) {
      state *= 6;
      state += rand6();
    }
  int retval = state % 7;
  state /= 7;

  int i = (count >= 6) + 1; 
  if (++count > 6) count = 0;
  while (i-- > 0) {
    state *= 6;
    state += rand6();
  }
  return retval;
}

int rand7（无效）{
静态整数计数=-1；
静态无符号长状态=0；
如果（计数<0）{
计数=0；
对于（inti=0；i=6）+1；
如果（++计数>6）计数=0；
而（i-->0）{
州*=6；
state+=rand6（）；
}
返回返回；
}

继我的一条评论之后，这里是该算法的定点版本。它使用无符号4.60（即数字的小数部分有60位），这比从

双精度算法中得到的多几位：
int rand7fixed() {
    static uint64_t a = 0;
    static uint64_t width = 7UL<<60;
    static const uint64_t intmask = 0xfUL<<60;

    while (((a+width)&intmask) != (a&intmask)) {
      width /= 6;
      a += (rand6()-1)*width;
    }

    int n = a >> 60;
    a &=~intmask;
    a *= 7;
    width *= 7;
    return n+1;
}

如果您觉得有必要减少对rand6
的调用次数，您可以利用612仅略多于711的事实一次生成11个7模辊。仍然有必要丢弃12个6模辊的一些集合，以消除偏差；丢弃集合的频率将为（612−711）/612）
，或大约1/11，因此平均每7卷需要1.19个6卷。使用25个6卷生成23个7卷（每7卷1.13个6卷）效果更好但这并不完全适合64位算术，因此调用rand6
的边际优势将被128位的计算所削弱
以下是11/12解决方案：
int rand7_12() {
    static int avail = 0;
    static uint32_t state = 0;
    static const uint32_t discard = 7*7*7*7*7*7*7*7*7*7*7; // 7 ** 11
    static const int out_per_round = 11;
    static const int in_per_round = 12;

    if (!avail) {
      do {
        state = rand6() - 1;
        for (int needed = in_per_round - 1; needed; --needed)
          state = state * 6 + rand6() - 1;
      }
      while (state >= discard);
      avail = out_per_round;
    }
    int rv = state % 7;
    state /= 7;
    --avail;
    return rv + 1;
}

理论上，您应该能够将比率降低到log76
，约为1.086。例如，您可以通过从972个6-rolls生成895个7-rolls来实现这一点，丢弃1600个集合中的一个，平均值为1.087个6-rolls/7-roll，但您需要2513位算术来保持状态
我用一个不太精确的基准测试了所有四个函数，该基准调用rand7 70000000次，然后打印结果的直方图。结果：
                                                  User time with
Algorithm       User time      rand6() calls     cycling rand6()
----------      ---------      -------------     ---------------
double          32.6 secs          760223193           13.2 secs
fixed           29.4 secs          760223194            7.9 secs
2 for 1         40.2 secs         1440004276
12 for 11       23.7 secs          840670008

以上的底层RAN6（）实现是GNU标准C++库的<代码> MunsixIn分布（1，6）< /C> >使用<代码> MTS937，64 < /代码>（64位Melsern扭曲器）为了更好地处理在标准库中花费的时间，我还使用了一个简单的循环计数器作为伪随机数生成器来运行测试；剩下的13.2秒和7.9秒表示（大致）算法本身所花费的时间，从中我们可以说定点算法快了40%左右。（很难对组合算法有很好的了解，因为固定序列使分支预测更容易，并减少了rand6调用的数量，但两者都用了不到5秒。）
最后，直方图，以防任何人想要检查偏差（还包括使用std:：uniform_int_distribution（1,7）
运行以供参考）：
对rand（）的调用
本身可能是最大的时间消耗者。浮点可能不会影响性能，但对正确性来说是一个大问题。它会使结果产生偏差。@jxhdouble有足够的位，它可能不会显著地使结果产生偏差。但它会使结果产生偏差。使用它的人将决定它是否会使结果产生偏差结果是否太多，无法使用。或者让我换一种方式说，偏差是否显著，取决于用例。@hyde:转换为修复是很简单的
int rand7_12() {
    static int avail = 0;
    static uint32_t state = 0;
    static const uint32_t discard = 7*7*7*7*7*7*7*7*7*7*7; // 7 ** 11
    static const int out_per_round = 11;
    static const int in_per_round = 12;

    if (!avail) {
      do {
        state = rand6() - 1;
        for (int needed = in_per_round - 1; needed; --needed)
          state = state * 6 + rand6() - 1;
      }
      while (state >= discard);
      avail = out_per_round;
    }
    int rv = state % 7;
    state /= 7;
    --avail;
    return rv + 1;
}

                                                  User time with
Algorithm       User time      rand6() calls     cycling rand6()
----------      ---------      -------------     ---------------
double          32.6 secs          760223193           13.2 secs
fixed           29.4 secs          760223194            7.9 secs
2 for 1         40.2 secs         1440004276
12 for 11       23.7 secs          840670008

Algorithm          1          2          3          4          5          6          7
---------  ---------  ---------  ---------  ---------  ---------  ---------  ---------
reference  100007522  100002456  100015800  100005923   99972185  100008908   99987206
double     100014597  100005975   99982219   99986299  100004561  100011049   99995300
fixed      100009603  100009639  100034790   99989935   99995502   99981886   99978645
2 for 1    100004476   99997766   99999521  100001382   99992802  100003868  100000185
12 for 11   99988156  100004974  100020070  100001912   99997472   99995015   99992401