在Python中将32位整数转换为四个8位整数的数组_Python_Numpy_Vectorization

在Python中将32位整数转换为四个8位整数的数组

python numpy

在Python中将32位整数转换为四个8位整数的数组,python,numpy,vectorization,Python,Numpy,Vectorization,如何在Python中高效地将32位整数转换为四个8位整数的数组目前我有以下代码，速度非常慢： def convert(int32_val): bin = np.binary_repr(int32_val, width = 32) int8_arr = [int(bin[0:8],2), int(bin[8:16],2), int(bin[16:24],2), int(bin[24:32],2)] return int8_arr

如何在Python中高效地将32位整数转换为四个8位整数的数组

目前我有以下代码，速度非常慢：

def convert(int32_val):
    bin = np.binary_repr(int32_val, width = 32) 
    int8_arr = [int(bin[0:8],2), int(bin[8:16],2), 
                int(bin[16:24],2), int(bin[24:32],2)]
    return int8_arr

例如：

我需要在无符号32位整数上实现相同的行为

另外。是否可以将其矢量化为32位整数的大numpy数组？

在我的测试中，仅使用python内置除法和模就可以提供6倍的加速

def convert(i):
    i = i % 4294967296
    n4 = i % 256
    i = i / 256
    n3 = i % 256
    i = i / 256
    n2 = i % 256
    n1 = i / 256
    return (n1,n2,n3,n4)

使用

dtype

，如中所述：

比较

In [38]: x%256
Out[38]: array([  0, 232, 208, 184, 160, 136, 112,  88,  64,  40,  16, 248])

更多关于

2）元组参数：适用于记录结构的唯一相关元组情况是将结构映射到现有数据类型。这是通过在元组中将现有数据类型与匹配的数据类型定义（使用此处描述的任何变体）配对来完成的。例如（使用列表定义，请参见3）了解更多详细信息：

x=np.zero（3，dtype=（'i4'，[（'r'，'u1'），（'g'，'u1'），（'b'，'u1'），（'a'，'u1'）]））

数组（[0,0,0]）

x['r']#数组（[0,0,0]，dtype=uint8）

在本例中，生成的数组外观和行为类似于简单的int32数组，但也具有仅使用int32的一个字节的字段定义（有点类似于Fortran等效）

获取4字节的2d数组的一种方法是：

In [46]: np.array([x1['f0'],x1['f1'],x1['f2'],x1['f3']])
Out[46]: 
array([[  0, 232, 208, 184, 160, 136, 112,  88,  64,  40,  16, 248],
       [  0,   3,   7,  11,  15,  19,  23,  27,  31,  35,  39,  42],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0]], dtype=uint8)

想法相同，但更紧凑：

In [50]: dt1=np.dtype(('i4', [('bytes','u1',4)]))

In [53]: x2=x.view(dtype=dt1)

In [54]: x2.dtype
Out[54]: dtype([('bytes', 'u1', (4,))])

In [55]: x2['bytes']
Out[55]: 
array([[  0,   0,   0,   0],
       [232,   3,   0,   0],
       [208,   7,   0,   0],
       [184,  11,   0,   0],
       [160,  15,   0,   0],
       [136,  19,   0,   0],
       [112,  23,   0,   0],
       [ 88,  27,   0,   0],
       [ 64,  31,   0,   0],
       [ 40,  35,   0,   0],
       [ 16,  39,   0,   0],
       [248,  42,   0,   0]], dtype=uint8)

In [56]: x2
Out[56]: 
array([    0,  1000,  2000,  3000,  4000,  5000,  6000,  7000,  8000,
        9000, 10000, 11000])

您可以使用位操作：

def int32_to_int8(n):
    mask = (1 << 8) - 1
    return [(n >> k) & mask for k in range(0, 32, 8)]

>>> int32_to_int8(32768)
[0, 128, 0, 0]

关于

struct

包，您可以利用的一个优点是，您可以非常高效地执行

int32

到

int8

：

import numpy.random

# Generate some random int32 numbers
x = numpy.random.randint(0, (1 << 31) - 1, 1000)

# Then you can convert all of them to int8 with just one command
x_int8 = struct.unpack('B' * (4*len(x)), buffer(x))

# To verify that the results are valid:
x[0]
Out[29]: 1219620060

int32_to_int8(x[0])
Out[30]: [220, 236, 177, 72]

x_int8[:4]
Out[31]: (220, 236, 177, 72)

# And it's FAST!

%timeit struct.unpack('B' * (4*len(x)), buffer(x))
10000 loops, best of 3: 32 µs per loop

%timeit [int32_to_int8(i) for i in x]
100 loops, best of 3: 6.01 ms per loop

如果要执行一些实际计算：

uint8_type = "B" * len(x) * 4
%timeit sum(struct.unpack(uint8_type, buffer(x)))
10000 loops, best of 3: 52.6 µs per loop

# slow because in order to call sum(), implicitly the view object is converted to
# list.
%timeit sum(x.view(np.int8))
1000 loops, best of 3: 768 µs per loop

# use the numpy.sum() function - without creating Python objects
%timeit np.sum(x.view(np.int8))
100000 loops, best of 3: 8.55 µs per loop # <- FAST!

uint8_type=“B”*len（x）*4
%timeit总和（结构解包（uint8_类型，缓冲区（x）））
10000个回路，最好为3个：每个回路52.6µs
#缓慢，因为为了调用sum（），视图对象隐式转换为
#名单。
%时间和（x.view（np.int8））
1000个回路，最好为3个：每个回路768µs
#使用numpy.sum（）函数-不创建Python对象
%timeit np.sum（x.view（np.int8））
100000个循环，最好是3:8.55µs/loop 35;在Python 3.2及更高版本中，有一种新的int
方法也可以使用：
>>> convert = lambda n : [int(i) for i in n.to_bytes(4, byteorder='big', signed=True)]
>>>
>>> convert(1)
[0, 0, 0, 1]
>>>
>>> convert(-1)
[255, 255, 255, 255]
>>>
>>> convert(-1306918380)
[178, 26, 2, 20]
>>>

正常的div/mod操作有什么问题？dtype
允许您以两种不同的方式查看数组。有一个这样做的例子np.dtype（（np.int16，{'x'：（np.int8,0），'y'：（np.int8,1）}））
您有32位整数的numpy数组吗？给出了一个你想要处理的实际输入的例子。我只是在对@hpaulj的回答的评论中提出了这个建议。如果数组是x
，你可以使用y=x.view（np.uint8）。在几乎所有人都使用的little-endian平台上重塑（x.shape+（4，）
@WarrenWeckesser，视图会给他相反的字节顺序（1->[1，0，0，0]
），因此重塑视图后可能需要[：，：：：-1]
。但是，是的，一行程序绝对是解决这个问题的方法。使用数据类型创建视图绝对是正确的方法。如果x
是np.int32的一个连续数组，它可以像y=x.view（np.uint8）一样简单。重塑（x.shape+（4，）
。谢谢，这肯定比我最初的方法快得多。虽然比基于视图的方法慢很多，这是真的。因为基于视图的方法不会生成新数据；它只是使用不同类型的指针进行枚举<另一方面，code>struct.unpack

需要创建一大堆Python对象，这会降低性能。多亏了你的提问，我学到了一些新东西。效果很好！添加了可选的大/小尾端标志和函数，以便在此从bytearray转换回in

>>> import struct
>>> int32 = struct.pack("I", 32768)
>>> struct.unpack("B" * 4, int32)

(0, 128, 0, 0)

import numpy.random

# Generate some random int32 numbers
x = numpy.random.randint(0, (1 << 31) - 1, 1000)

# Then you can convert all of them to int8 with just one command
x_int8 = struct.unpack('B' * (4*len(x)), buffer(x))

# To verify that the results are valid:
x[0]
Out[29]: 1219620060

int32_to_int8(x[0])
Out[30]: [220, 236, 177, 72]

x_int8[:4]
Out[31]: (220, 236, 177, 72)

# And it's FAST!

%timeit struct.unpack('B' * (4*len(x)), buffer(x))
10000 loops, best of 3: 32 µs per loop

%timeit [int32_to_int8(i) for i in x]
100 loops, best of 3: 6.01 ms per loop

import numpy as np

# this is fast because it only creates the view, without involving any creation
# of objects in Python
%timeit x.view(np.int8)
1000000 loops, best of 3: 570 ns per loop

uint8_type = "B" * len(x) * 4
%timeit sum(struct.unpack(uint8_type, buffer(x)))
10000 loops, best of 3: 52.6 µs per loop

# slow because in order to call sum(), implicitly the view object is converted to
# list.
%timeit sum(x.view(np.int8))
1000 loops, best of 3: 768 µs per loop

# use the numpy.sum() function - without creating Python objects
%timeit np.sum(x.view(np.int8))
100000 loops, best of 3: 8.55 µs per loop # <- FAST!

>>> convert = lambda n : [int(i) for i in n.to_bytes(4, byteorder='big', signed=True)]
>>>
>>> convert(1)
[0, 0, 0, 1]
>>>
>>> convert(-1)
[255, 255, 255, 255]
>>>
>>> convert(-1306918380)
[178, 26, 2, 20]
>>>