Python Numpy字符串编码_Python_String_Python 3.x_Numpy

Python Numpy字符串编码

python string python-3.x numpy

Python Numpy字符串编码,python,string,python-3.x,numpy,Python,String,Python 3.x,Numpy,模块numpy是一个非常好的工具，可以高效地存储python对象，其中包括字符串。对于numpy数组中的ANSI字符串，每个字符仅使用1字节然而，有一个不便之处。存储对象的类型不再是字符串，而是字节，这意味着在大多数情况下必须对其进行解码以供进一步使用，这反过来意味着代码相当庞大： >>> import numpy >>> my_array = numpy.array(['apple', 'pear'], dtype = 'S5') >>>

模块numpy是一个非常好的工具，可以高效地存储python对象，其中包括字符串。对于numpy数组中的ANSI字符串，每个字符仅使用1字节

然而，有一个不便之处。存储对象的类型不再是

字符串

，而是

字节

，这意味着在大多数情况下必须对其进行解码以供进一步使用，这反过来意味着代码相当庞大：

>>> import numpy
>>> my_array = numpy.array(['apple', 'pear'], dtype = 'S5')
>>> print("Mary has an {} and a {}".format(my_array[0], my_array[1]))
Mary has an b'apple' and a b'pear'
>>> print("Mary has an {} and a {}".format(my_array[0].decode('utf-8'),
... my_array[1].decode('utf-8')))
Mary has an apple and a pear

这种不便可以通过使用另一种数据类型来消除，例如：

>>> my_array = numpy.array(['apple', 'pear'], dtype = 'U5')
>>> print("Mary has an {} and a {}".format(my_array[0], my_array[1]))
Mary has an apple and a pear

但是，这只能通过将内存使用率提高4倍来实现：

>>> numpy.info(my_array)
class:  ndarray
shape:  (2,)
strides:  (20,)

物品大小：20

aligned:True
是的
fortran：是的
数据指针：0x1a5b020
字节顺序：小
byteswap:False
类型：与解码
相比差异不大，但是astype
可以工作（并且可以应用于整个数组而不是每个字符串）。但只要需要，较长的阵列将一直存在
In [538]: x=my_array.astype('U');"Mary has an {} and a {}".format(x[0],x[1])
Out[538]: 'Mary has an apple and a pear'

我在格式
语法中找不到任何会强制减少“b”格式的内容

-显示如何自定义格式化程序类，更改format\u字段
方法。我尝试了类似的convert\u字段
方法。但是调用语法仍然很混乱
In [562]: def makeU(astr):
    return astr.decode('utf-8')
   .....: 

In [563]: class MyFormatter(string.Formatter):
    def convert_field(self, value, conversion):
        if 'q'== conversion:
            return makeU(value)
        else:
            return super(MyFormatter, self).convert_field(value, conversion)
   .....:         

In [564]: MyFormatter().format("Mary has an {!q} and a {!q}",my_array[0],my_array[1])
Out[564]: 'Mary has an apple and a pear'


执行此格式化的其他两种方法：
In [642]: "Mary has an {1} and a {0} or {1}".format(*my_array.astype('U'))
Out[642]: 'Mary has an pear and a apple or pear'

这将转换数组（动态）并将其作为列表传递到格式。如果阵列已经是unicode，则它也可以工作：
In [643]: "Mary has an {1} and a {0} or {1}".format(*uarray.astype('U'))
Out[643]: 'Mary has an pear and a apple or pear'

np.char
具有将字符串函数应用于字符数组元素的函数。使用此解码
可应用于整个阵列：
In [644]: "Mary has a {1} and an {0}".format(*np.char.decode(my_array))
Out[644]: 'Mary has a pear and an apple'

（如果数组已经是unicode，则此操作不起作用）
如果您经常使用字符串数组，np.char
值得研究。
给定：
>>> my_array = numpy.array(['apple', 'pear'], dtype = 'S5')

您可以动态解码：
>>> print("Mary has an {} and a {}".format(*map(lambda b: b.decode('utf-8'), my_array)))
Mary has an apple and a pear

或者，您可以创建特定的格式化程序：
import string
class ByteFormatter(string.Formatter):
    def __init__(self, decoder='utf-8'):
        self.decoder=decoder

    def format_field(self, value, spec):
        if isinstance(value, bytes):
            return value.decode(self.decoder)
        return super(ByteFormatter, self).format_field(value, spec)   

>>> print(ByteFormatter().format("Mary has an {} and a {}", *my_array))
Mary has an apple and a pear

这是Python3的一个问题，它显示带有b
的字节字符串。感谢您给出深刻的答案。因为我不仅需要格式化字符串，还需要将单个数组元素传递给函数，所以我选择了生成函数：def U（astr）：返回astr.decode（'utf-8'），因为它需要最少的额外符号。这也是最明显的解决办法。
>>> print("Mary has an {} and a {}".format(*map(lambda b: b.decode('utf-8'), my_array)))
Mary has an apple and a pear

import string
class ByteFormatter(string.Formatter):
    def __init__(self, decoder='utf-8'):
        self.decoder=decoder

    def format_field(self, value, spec):
        if isinstance(value, bytes):
            return value.decode(self.decoder)
        return super(ByteFormatter, self).format_field(value, spec)   

>>> print(ByteFormatter().format("Mary has an {} and a {}", *my_array))
Mary has an apple and a pear