C 长双精度浮点错误_C_Floating Point

C 长双精度浮点错误

c floating-point

C 长双精度浮点错误,c,floating-point,C,Floating Point,If用C编写了一个程序，以了解与重复除法有关的浮点误差的大小 #include <stdio.h> int main (int argc, char* argv[]) { if (argc < 3) { printf("Enter a decimal number as the first positional " "argument\n"); printf("Enter the maximum num

If用C编写了一个程序，以了解与重复除法有关的浮点误差的大小

#include <stdio.h>

int main (int argc, char* argv[]) {
    if (argc < 3) {
        printf("Enter a decimal number as the first positional " 
                "argument\n");
        printf("Enter the maximum number of digits to print as the " 
                "second positional argument\n");
        return 0;
    }   

    long double d;
    sscanf(argv[1], "%Lf", &d);
    int m;
    sscanf(argv[2], "%d", &m);

    int i;
    char format[10];
    for (i = 1; i <= m; ++i) {
        printf("(%d digits)\n", i); 
        sprintf(format, "%%.%dLf\n\n", i); 
        printf(format, d); 
    }   

    long double p = d;
    printf("\n");
    for (i = 1; i <= m; ++i) {
        printf("(%Lf/10e%d with %d digits)\n", d, i, m); 
        p = p/(long double)10.0;
        printf(format, p); 
    }
    return 0;
}

这里我们观察到485位浮点噪声。这是用GCC4.4.3编译的，我假设它使用80位扩展精度。然而，485位十进制数字远远超过了80位信息。所以，我的问题是，这些信息来自哪里？

没有额外的信息打印出来。打印的值正好是

的值

经过180次迭代后，

为+0x1.A8E90F9908E0CA56p-602，即15309010345804195115•2-665。IEEE 754标准将浮点数的值定义为符号（+1或−1）乘以2的整数幂（由数字的指数字段确定）乘以其有效位（分数部分）的值。所以每个浮点数都有一个特定的值。以上是
p
的值。十进制，这个值正是.999999999999999999969819570700939858153376736698732853283605408116087882762948991724868957176649769045358705872354052261113540314114885779914335315639806061208847920179776799404948795506248532485303630811119507604985596684233990126219304092175565232198569923253737561276484626462077772036038845251286782974821021132356946292172207615386395848331484216638644272380029035758729644340836222808959709090969637124943490034914855945331906598229107537684733075890119912190129980449084208984375•10-181
这是您的程序产生的价值。因此，您的输出格式化程序已经准确地打印了
p
的值。它做得很好
事实上，在所有方面，浮点运算都做得很好。该值是最接近10-181的长双精度值。在长距离的双人赛中不可能再接近了。因此，即使经过数百次算术运算，错误也没有增加

这里没有新的信息。如果我们被告知表示
p
的位，我们可能会产生同样的数百位十进制数字。他们不会告诉你任何新的事情。但也不是垃圾,；它们完全由
p
的值决定，没有打印额外的信息。打印的值正好是
p
的值
经过180次迭代后，
p
为+0x1.A8E90F9908E0CA56p-602，即15309010345804195115•2-665。IEEE 754标准将浮点数的值定义为符号（+1或−1）乘以2的整数幂（由数字的指数字段确定）乘以其有效位（分数部分）的值。所以每个浮点数都有一个特定的值。以上是
p
的值。十进制，这个值正是.999999999999999999969819570700939858153376736698732853283605408116087882762948991724868957176649769045358705872354052261113540314114885779914335315639806061208847920179776799404948795506248532485303630811119507604985596684233990126219304092175565232198569923253737561276484626462077772036038845251286782974821021132356946292172207615386395848331484216638644272380029035758729644340836222808959709090969637124943490034914855945331906598229107537684733075890119912190129980449084208984375•10-181
这是您的程序产生的价值。因此，您的输出格式化程序已经准确地打印了
p
的值。它做得很好
事实上，在所有方面，浮点运算都做得很好。该值是最接近10-181的长双精度值。在长距离的双人赛中不可能再接近了。因此，即使经过数百次算术运算，错误也没有增加

这里没有新的信息。如果我们被告知表示
p
的位，我们可能会产生同样的数百位十进制数字。他们不会告诉你任何新的事情。但也不是垃圾,；它们完全由
p
的值决定，为Eric的优秀答案添加一些进一步的信息，第181次迭代按您的方式计算，恰好是最接近10^-181的长双精度，但这并不适用于每一个n
例如，
1/10.0/10.0/10.0/10.0！=1/10000.0
当以长双精度计算时
在squeak Smalltalk中使用我自己的浮点仿真包，我可以说在前300个10^-n中，77是最接近的长双精度值，223不是

(1 to: 300) count: [:n | ((1 to: n) inject: (1 asArbitraryPrecisionFloatNumBits: 64) into: [:p :i | p/10]) ~= ((10 raisedTo: n negated) asArbitraryPrecisionFloatNumBits: 64)]
10^-218的差异峰值为4ULP

(1 to: 300) detectMax: [:n | (((1 to: n) inject: (1 asArbitraryPrecisionFloatNumBits: 64) into: [:p :i | p/10]) - ((10 raisedTo: n negated) asArbitraryPrecisionFloatNumBits: 64)) abs / (2 raisedTo: -63+((10 raisedTo: n negated) floorLog: 2))].
以下是ulp方面错误的演变：

(1 to: 300) collect: [:n | ((((1 to: n) inject: (1 asArbitraryPrecisionFloatNumBits: 64) into: [:p :i | p/10]) - ((10 raisedTo: n negated) asArbitraryPrecisionFloatNumBits: 64)) / (2 raisedTo: -63+((10 raisedTo: n negated) floorLog: 2))) asInteger]. #(0 0 0 -1 -1 -1 -1 -1 -1 -2 -1 -1 -1 -1 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -1 -1 0 -1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 -1 0 0 0 1 0 0 0 0 0 0 1 1 1 1 0 0 0 0 -1 0 -1 -1 -1 -1 -2 -1 -1 -2 -2 -2 -3 -2 -2 -3 -2 -2 -3 -2 -2 -2 -2 -1 -2 -1 -1 -2 -2 -2 -1 -2 -2 -1 -2 -2 -1 -2 -2 -2 -3 -2 -1 -2 -2 -1 -2 -2 -1 -2 -2 -2 -3 -2 -2 -1 -1 -1 -1 -1 -1 -1 -2 -2 -1 -3 -2 -2 -3 -2 -2 -3 -3 -2 -2 -2 -2 -3 -2 -2 -3 -3 -2 -3 -2 -2 -2 -3 -2 -2 -3 -2 -1 -2 -2 -1 -2 -1 -1 -2 -1 -1 -1 0 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 -1 -1 -1 -1 0 0 0 0 -1 -1 -1 -2 -1 0 -1 -1 -1 -1 -1 -2 -1 -1 -1 -1 -2 -2 -2 -2 -2 -2 -3 -3 -2 -4 -3 -2 -3 -2 -2 -3 -2 -2 -2 -2 -1 -3 -2 -2 -3 -3 -2 -1 -2 -2 -1 -2 -2 -1 -3 -2 -2 -3 -3 -2 -3 -2 -1 -1 -1 0 0 0 0 0 -1 0 0 -1 0 0 -1 0 0 0 0 0 -1 -1 0 0 -1 -1 -1 -1 0 0 0 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 0 0 0 0)

为了给Eric的优秀答案添加更多信息，第181次迭代按照您的方式计算，恰好是最接近10^-181的长双精度，但这并不适用于每一个n
例如，
1/10.0/10.0/10.0/10.0！=1/10000.0
当以长双精度计算时
在squeak Smalltalk中使用我自己的浮点仿真包，我可以说在前300个10^-n中，77是最接近的长双精度值，223不是

(1 to: 300) count: [:n | ((1 to: n) inject: (1 asArbitraryPrecisionFloatNumBits: 64) into: [:p :i | p/10]) ~= ((10 raisedTo: n negated) asArbitraryPrecisionFloatNumBits: 64)]
10^-218的差异峰值为4ULP

(1 to: 300) detectMax: [:n | (((1 to: n) inject: (1 asArbitraryPrecisionFloatNumBits: 64) into: [:p :i | p/10]) - ((10 raisedTo: n negated) asArbitraryPrecisionFloatNumBits: 64)) abs / (2 raisedTo: -63+((10 raisedTo: n negated) floorLog: 2))].
以下是ulp方面错误的演变：

(1 to: 300) collect: [:n | ((((1 to: n) inject: (1 asArbitraryPrecisionFloatNumBits: 64) into: [:p :i | p/10]) - ((10 raisedTo: n negated) asArbitraryPrecisionFloatNumBits: 64)) / (2 raisedTo: -63+((10 raisedTo: n negated) floorLog: 2))) asInteger]. #(0 0 0 -1 -1 -1 -1 -1 -1 -2 -1 -1 -1 -1 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -1 -1 0 -1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 -1 0 0 0 1 0 0 0 0 0 0 1 1 1 1 0 0 0 0 -1 0 -1 -1 -1 -1 -2 -1 -1 -2 -2 -2 -3 -2 -2 -3 -2 -2 -3 -2 -2 -2 -2 -1 -2 -1 -1 -2 -2 -2 -1 -2 -2 -1 -2 -2 -1 -2 -2 -2 -3 -2 -1 -2 -2 -1 -2 -2 -1 -2 -2 -2 -3 -2 -2 -1 -1 -1 -1 -1 -1 -1 -2 -2 -1 -3 -2 -2 -3 -2 -2 -3 -3 -2 -2 -2 -2 -3 -2 -2 -3 -3 -2 -3 -2 -2 -2 -3 -2 -2 -3 -2 -1 -2 -2 -1 -2 -1 -1 -2 -1 -1 -1 0 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 -1 -1 -1 -1 0 0 0 0 -1 -1 -1 -2 -1 0 -1 -1 -1 -1 -1 -2 -1 -1 -1 -1 -2 -2 -2 -2 -2 -2 -3 -3 -2 -4 -3 -2 -3 -2 -2 -3 -2 -2 -2 -2 -1 -3 -2 -2 -3 -3 -2 -1 -2 -2 -1 -2 -2 -1 -3 -2 -2 -3 -3 -2 -3 -2 -1 -1 -1 0 0 0 0 0 -1 0 0 -1 0 0 -1 0 0 0 0 0 -1 -1 0 0 -1 -1 -1 -1 0 0 0 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 0 0 0 0)

正如软件Monkey所述，您无法将浮点数精确转换为十进制数，因此您所看到的相当于1/3为0.33333333…AFAIK，对于80位精度，您需要使用
-mfpmath=387
（至少在x86-64上）进行编译才能使用FP协处理器。默认值是
-mfpmath=sse
，我认为它不支持80位精度。问题是那些1208925819614629174706176值不能简洁地用十进制表示。@NikosC。“对于80位精度，您需要使用-mfpmath=387进行编译”否！使用
-mfpmath=387
意味着所有计算都是以80位扩展精度而不是操作类型的精度进行的。但是，即使使用
-mfpmath=sse
（即当compi