tolist后numpy阵列的精度损失_Numpy

tolist后numpy阵列的精度损失

numpy

tolist后numpy阵列的精度损失,numpy,Numpy,我有一个numpy数组，其中每个数字都有特定的指定精度（使用大约（x，1）我正在尝试将每个数字转换为字符串，以便使用python docx将它们写入word表。但是tolist（）函数的结果完全是一团糟。数字的精度会丢失，导致很长的输出 [['3.0', '15294.7001953', '32977.6992188', '4419.5', '978.400024414', '504.399993896', '123.599998474'], ['4.0', '1

我有一个numpy数组，其中每个数字都有特定的指定精度（使用大约（x，1）

我正在尝试将每个数字转换为字符串，以便使用python docx将它们写入word表。但是tolist（）函数的结果完全是一团糟。数字的精度会丢失，导致很长的输出

[['3.0',
  '15294.7001953',
  '32977.6992188',
  '4419.5',
  '978.400024414',
  '504.399993896',
  '123.599998474'],
 ['4.0',
  '14173.7998047',
  '31487.1992188',
  '3853.89990234',
  '967.799987793',
  '410.200012207',
  '107.099998474'],
.......

除了tolist（）函数外，我还尝试了[[str（e）for e in a]for a in m]。结果是一样的。这很烦人。如何在保持精度的同时轻松转换为字符串？谢谢！

转换为字符串时出现了一些问题。仅使用数字：

>>> import numpy as np
>>> a = np.random.random(10)*30
>>> a
array([ 27.30713434,  10.25895255,  19.65843272,  23.93161555,
        29.08479175,  25.69713898,  11.90236158,   5.41050686,
        18.16481691,  14.12808414])
>>> 
>>> b = np.round(a, decimals=1)
>>> b
array([ 27.3,  10.3,  19.7,  23.9,  29.1,  25.7,  11.9,   5.4,  18.2,  14.1])
>>> b.tolist()
[27.3, 10.3, 19.7, 23.9, 29.1, 25.7, 11.9, 5.4, 18.2, 14.1]

请注意，

np.round

不起作用：

>>> a
array([ 27.30713434,  10.25895255,  19.65843272,  23.93161555,
        29.08479175,  25.69713898,  11.90236158,   5.41050686,
        18.16481691,  14.12808414])

如果您只需要将数字转换为字符串：

>>> " ".join(str(_) for _ in np.round(a, 1)) 
'27.3 10.3 19.7 23.9 29.1 25.7 11.9 5.4 18.2 14.1'

EDIT：显然，

np.round

不能很好地处理

float32

（其他答案给出了原因）。一个简单的解决方法是将数组显式地强制转换为

np.float

或

np.float64

或仅仅是

float

：

>>> # prepare an array of float32 values
>>> a32  = (np.random.random(10) * 30).astype(np.float32)
>>> a32.dtype
dtype('float32')
>>> 
>>> # notice the use of .astype(np.float32)
>>> np.round(a32.astype(np.float64), 1)
array([  5.5,   8.2,  29.8,   8.6,  15.5,  28.3,   2. ,  24.5,  18.4,   8.3])
>>>

EDIT2：正如沃伦在他的回答中所证明的那样，字符串格式实际上是正确的（试试

“%.1f”%（4.79，）

）因此，不需要在浮点类型之间进行转换。我的答案主要是为了提醒大家，在这种情况下使用

np.around

不是正确的做法。

浮点非常擅长以定义良好的相对精度存储大范围。在32位浮点的情况下，这大约是7位有效数字。如您所述注意，当你做取整练习时，你得到的实际数字并不完全是你希望得到的数字，而是接近7个有效数字

获取所需内容的一种方法可能是使用。您可以通过将dtype设置为该类型来构造这些内容的numpy数组：

import decimal
a = numpy.array(original_array, dtype=decimal.Decimal)

注意，结果数组只是一个python对象数组，而不是一个“适当的”numpy数组，因此您可能需要使用自己的舍入函数，也可能需要一些其他不起作用的东西

最好只处理内置python结构，以获得所需的内容。

精度不会“丢失”；您一开始就没有这种精度。值15294.7不能用单一精度（即np.float32）精确表示；最佳近似值是15294.70019…：

In [1]: x = np.array([15294.7], dtype=np.float32)

In [2]: x
Out[2]: array([ 15294.70019531], dtype=float32)

看

使用np.float64可以获得更好的近似值，但它仍然不能精确表示15294.7

如果希望文本输出格式为一个十进制数字，请使用为格式化文本输出设计的函数，例如

np.savetxt

：

In [56]: x = np.array([[15294.7, 32977.7],[14173.8, 31487.2]], dtype=np.float32) 

In [57]: x
Out[57]: 
array([[ 15294.70019531,  32977.69921875],
       [ 14173.79980469,  31487.19921875]], dtype=float32)

In [58]: np.savetxt("data.txt", x, fmt="%.1f", delimiter=",")

In [59]: !cat data.txt
15294.7,32977.7
14173.8,31487.2

如果您确实需要一个由格式良好的字符串组成的numpy数组，可以执行以下操作：

In [63]: def myfmt(r):
   ....:     return "%.1f" % (r,)
   ....: 

In [64]: vecfmt = np.vectorize(myfmt)

In [65]: vecfmt(x)
Out[65]: 
array([['15294.7', '32977.7'],
       ['14173.8', '31487.2']], 
      dtype='|S64')

如果您使用这两种方法中的任何一种，都不需要首先通过

传递数据；取整将作为格式化过程的一部分进行。
所有答案都正确地谈论浮点精度和输出，但我想补充一点，您不需要使用从np.array
转换为列表首先是ist
。事实上，您很少需要执行该操作，因为numpy数组的行为通常都是相同的，正如我在下面的示例中所说明的：
import docx
import numpy as np

# Your values from above
raw_data = np.array([[ 3., 15294.7, 32977.7, 4419.5,  978.4, 504.4, 123.6],
                     [ 4., 14173.8, 31487.2, 3853.9,  967.8, 410.2, 107.1],
                     [ 5., 15323.5, 34754.5, 3738.7, 1034.7, 376.1, 105.5],
                     [ 6., 17396.7, 41164.5, 3787.4, 1103.2, 363.9, 109.4],
                     [ 7., 19665.5, 48967.6, 3900.9, 1161.0, 362.1, 115.8],
                     [ 8., 21839.8, 56922.5, 4037.4, 1208.2, 365.9, 123.5],
                     [ 9., 23840.6, 64573.8, 4178.1, 1247.0, 373.2, 131.9],
                     [10., 25659.9, 71800.2, 4314.8, 1279.5, 382.7, 140.5],
                     [11., 27310.3, 78577.7, 4444.3, 1307.1, 393.7, 149.1],
                     [12., 28809.1, 84910.4, 4565.8, 1331.0, 405.5, 157.4]],
                    dtype=np.float32)

# This conversion is just for comparison purposes, both tables will be printed.
pyt_data = raw_data.tolist()

def create_table(document, values, heading):
    """Creates a docx table inside the document.

    This function takes a docx.Document, a two-dimensional data structure, e.g.
    numpy arrays or a list of lists, and fills the table with it.
    The table is also prefixed with a heading.
    """
    document.add_heading(heading)
    table = document.add_table(rows=0, cols=len(values[0]))
    for row in values:
        cells = table.add_row().cells
        for i, value in enumerate(row):
            # Use `str` for any types, but the format string 
            # only if you expect numerical types exclusively
            cells[i].text = str(value)  # f'{value:.1f}'

document = docx.Document()
create_table(document, raw_data, 'Raw table')
create_table(document, pyt_data, 'tolist table')
document.save('table_demo.docx')

如果将注释行cells[i].text=str（value）
更改为cells[i].text=f'{value:.1f'}
（或者使用Python<3.6cells[i].text='{.1f}.格式（value）
），则当您使用自定义格式格式化浮点值时，这两个表都正常工作。如果仅使用字符串表示，则numpy值已经正确
请注意，如果使用np.float64
，两个版本都是正确的
使用字符串表示，生成的docx呈现如下：
In [63]: def myfmt(r):
   ....:     return "%.1f" % (r,)
   ....: 

In [64]: vecfmt = np.vectorize(myfmt)

In [65]: vecfmt(x)
Out[65]: 
array([['15294.7', '32977.7'],
       ['14173.8', '31487.2']], 
      dtype='|S64')


使用格式化字符串/格式化字符串文字，生成的docx如下所示：
In [63]: def myfmt(r):
   ....:     return "%.1f" % (r,)
   ....: 

In [64]: vecfmt = np.vectorize(myfmt)

In [65]: vecfmt(x)
Out[65]: 
array([['15294.7', '32977.7'],
       ['14173.8', '31487.2']], 
      dtype='|S64')

即使您一开始无法控制numpy float32数组中的数据，也可以将类型更改为更高精度，然后在调用tolist
之前进行取整。事实上，您甚至可以使用来进行字符串转换。例如：
>>将numpy作为np导入
>>>a=np.数组（[[3.015294.712977.7]，
[ 4419.5,   978.4,   504.4]])
>>>astype（float）.round（1）.astype（str）.tolist（）
[['3.0', '15294.7', '32977.7'], ['4419.5', '978.4', '504.4']]
你的数组是单精度（np.float32
）吗？是的，它是float32。这是个问题吗？请看我的答案，或者@Henrygomersall的答案谢谢你的回答，但我仍然不能正确回答。我简单地使用了np.around（x，1）。但是我在每个浮点数上都有一条很长的尾巴。比如：数组([448.3999939521.59997559581.70001221635.40002441688.79998779746,808,872.40002441935.900024411996.40002441]，数据类型=float32）（+1）我的印象是字符串格式是截断的，而不是四舍五入的。谢谢！谢谢你的解释。我的矩阵是由几个一维数组合并而成的，每个数组可能有不同的精度要求。这就是为什么is不能使用“%.1f”%（r，）在最后的显示过程中对它们进行中继。我现在切换到float64，它工作得很好，但我担心可能需要比float32更多的内存，因为我的数据可能很大。can not通常写为cannot:）