Numpy将多维数据集拆分为多维数据集_Numpy

Numpy将多维数据集拆分为多维数据集

numpy

Numpy将多维数据集拆分为多维数据集,numpy,Numpy,有一个函数np.split（），可以沿一个轴拆分数组。我想知道是否有一个多轴版本，例如可以沿轴（0,1,2）拆分假设立方体具有形状（W，H，D），您希望将其分解为N形状（W，H，D）的小立方体。由于NumPy数组具有固定长度的轴，w必须均匀地划分w，对于h和d也是如此然后有一种方法可以将立方体形状（W，H，D）重塑为新的形状数组（N，W，H，D）例如，如果arr=np.arange（4*4*4）。重塑（4,4,4）（因此（W，H，D）=（4,4,4）），我们希望将其分解为形状为立方体的（2

有一个函数

np.split（）

，可以沿一个轴拆分数组。我想知道是否有一个多轴版本，例如可以沿轴（0,1,2）拆分

假设

立方体

具有形状

（W，H，D）

，您希望将其分解为

形状

（W，H，D）

的小立方体。由于NumPy数组具有固定长度的轴，

必须均匀地划分

，对于

和

也是如此

然后有一种方法可以将立方体形状

（W，H，D）

重塑为新的形状数组

（N，W，H，D）

例如，如果arr=np.arange（4*4*4）。重塑（4,4,4）（因此

（W，H，D）=（4,4,4）

），我们希望将其分解为形状为立方体的

（2,2,2）

，那么我们可以使用

In [283]: arr.reshape(2,2,2,2,2,2).transpose(0,2,4,1,3,5).reshape(-1,2,2,2)
Out[283]: 
array([[[[ 0,  1],
         [ 4,  5]],

        [[16, 17],
         [20, 21]]],

...
       [[[42, 43],
         [46, 47]],

        [[58, 59],
         [62, 63]]]])

这里的想法是向数组中添加额外的轴，这些轴可以作为放置标记：

 number of repeats act as placemarkers
 o---o---o
 |   |   |
 v   v   v
(2,2,2,2,2,2)
   ^   ^   ^
   |   |   |
   o---o---o
   newshape

然后我们可以对轴进行重新排序（使用

转置

），以使重复次数排在第一位，新闻形状排在最后：

arr.reshape(2,2,2,2,2,2).transpose(0,2,4,1,3,5)

最后，调用

重塑（-1，w，h，d）

将所有位置标记轴挤压成一个轴。这将生成一个形状数组

（N，w，h，d）

，其中

是小立方体的数量

上面使用的想法是对三维的概括。它可以进一步推广到任意维的nArray：

import numpy as np
def cubify(arr, newshape):
    oldshape = np.array(arr.shape)
    repeats = (oldshape / newshape).astype(int)
    tmpshape = np.column_stack([repeats, newshape]).ravel()
    order = np.arange(len(tmpshape))
    order = np.concatenate([order[::2], order[1::2]])
    # newshape must divide oldshape evenly or else ValueError will be raised
    return arr.reshape(tmpshape).transpose(order).reshape(-1, *newshape)

print(cubify(np.arange(4*6*16).reshape(4,6,16), (2,3,4)).shape)
print(cubify(np.arange(8*8*8*8).reshape(8,8,8,8), (2,2,2,2)).shape)

产生新的形状数组

(16, 2, 3, 4)
(256, 2, 2, 2, 2)

要“取消绑定”阵列，请执行以下操作：

def uncubify(arr, oldshape):
    N, newshape = arr.shape[0], arr.shape[1:]
    oldshape = np.array(oldshape)    
    repeats = (oldshape / newshape).astype(int)
    tmpshape = np.concatenate([repeats, newshape])
    order = np.arange(len(tmpshape)).reshape(2, -1).ravel(order='F')
    return arr.reshape(tmpshape).transpose(order).reshape(oldshape)

下面是一些测试代码，用于检查

cubify

和

uncubify

是否为反向

import numpy as np
def cubify(arr, newshape):
    oldshape = np.array(arr.shape)
    repeats = (oldshape / newshape).astype(int)
    tmpshape = np.column_stack([repeats, newshape]).ravel()
    order = np.arange(len(tmpshape))
    order = np.concatenate([order[::2], order[1::2]])
    # newshape must divide oldshape evenly or else ValueError will be raised
    return arr.reshape(tmpshape).transpose(order).reshape(-1, *newshape)

def uncubify(arr, oldshape):
    N, newshape = arr.shape[0], arr.shape[1:]
    oldshape = np.array(oldshape)    
    repeats = (oldshape / newshape).astype(int)
    tmpshape = np.concatenate([repeats, newshape])
    order = np.arange(len(tmpshape)).reshape(2, -1).ravel(order='F')
    return arr.reshape(tmpshape).transpose(order).reshape(oldshape)

tests = [[np.arange(4*6*16), (4,6,16), (2,3,4)],
         [np.arange(8*8*8*8), (8,8,8,8), (2,2,2,2)]]

for arr, oldshape, newshape in tests:
    arr = arr.reshape(oldshape)
    assert np.allclose(uncubify(cubify(arr, newshape), oldshape), arr)
    # cuber = Cubify(oldshape,newshape)
    # assert np.allclose(cuber.uncubify(cuber.cubify(arr)), arr)

我不认为有一个多轴版本，你可以沿着一些给定的轴分裂。但是你可以一次把它分成一个维度。例如：

def split2(arys, sections, axis=[0, 1]):
    if not isinstance(arys, list):
         arys = [arys]
    for ax in axis:
        arys = [np.split(ary, sections, axis=ax) for ary in arys]
        arys = [ary for aa in arys for ary in aa]  # Flatten
    return arys

In [1]: a = np.array(range(100)).reshape(10, 10)
In [2]: split2(a, 2, axis=[0, 1])
Out[2]:
[array([[ 0,  1,  2,  3,  4],
       [10, 11, 12, 13, 14],
       [20, 21, 22, 23, 24],
       [30, 31, 32, 33, 34],
       [40, 41, 42, 43, 44]]),
 array([[ 5,  6,  7,  8,  9],
       [15, 16, 17, 18, 19],
       [25, 26, 27, 28, 29],
       [35, 36, 37, 38, 39],
       [45, 46, 47, 48, 49]]),
 array([[50, 51, 52, 53, 54],
       [60, 61, 62, 63, 64],
       [70, 71, 72, 73, 74],
       [80, 81, 82, 83, 84],
       [90, 91, 92, 93, 94]]),
 array([[55, 56, 57, 58, 59],
       [65, 66, 67, 68, 69],
       [75, 76, 77, 78, 79],
       [85, 86, 87, 88, 89],
       [95, 96, 97, 98, 99]])]

它可以这样使用：

def split2(arys, sections, axis=[0, 1]):
    if not isinstance(arys, list):
         arys = [arys]
    for ax in axis:
        arys = [np.split(ary, sections, axis=ax) for ary in arys]
        arys = [ary for aa in arys for ary in aa]  # Flatten
    return arys

In [1]: a = np.array(range(100)).reshape(10, 10)
In [2]: split2(a, 2, axis=[0, 1])
Out[2]:
[array([[ 0,  1,  2,  3,  4],
       [10, 11, 12, 13, 14],
       [20, 21, 22, 23, 24],
       [30, 31, 32, 33, 34],
       [40, 41, 42, 43, 44]]),
 array([[ 5,  6,  7,  8,  9],
       [15, 16, 17, 18, 19],
       [25, 26, 27, 28, 29],
       [35, 36, 37, 38, 39],
       [45, 46, 47, 48, 49]]),
 array([[50, 51, 52, 53, 54],
       [60, 61, 62, 63, 64],
       [70, 71, 72, 73, 74],
       [80, 81, 82, 83, 84],
       [90, 91, 92, 93, 94]]),
 array([[55, 56, 57, 58, 59],
       [65, 66, 67, 68, 69],
       [75, 76, 77, 78, 79],
       [85, 86, 87, 88, 89],
       [95, 96, 97, 98, 99]])]

除了@unutbu回答的额外问题之外，我想我的回答正好相反（如果你想将一个立方体分割成多个立方体，对每个立方体应用一个函数，然后再将它们合并回来）

如果我想解开它，我需要什么样的转置-重塑组合才能把它带回大立方体？@mattdns：我在上面添加了一个

unbubify

函数。我们对

reverseOrder

的定义不同（或者我在

uncubify

函数中称之为

order

）。对于高维数组，这会有所不同。将定义更改为

self.reverseOrder=np.arange（len（self.tmpshape））。重塑（2，-1）。拉威尔（order='F'）

将修复您的代码。回答得很好。我可以建议用：

einops.重新排列（x'（xw）（yh）（zd）->（xyz）whd'，w=2，h=2，d=2）

和

einops.重新排列（x'（xyz）whd->（xw）（yh）（zd）

？