Numpy将多维数据集拆分为多维数据集
有一个函数Numpy将多维数据集拆分为多维数据集,numpy,Numpy,有一个函数np.split(),可以沿一个轴拆分数组。我想知道是否有一个多轴版本,例如可以沿轴(0,1,2)拆分 假设立方体具有形状(W,H,D),您希望将其分解为N形状(W,H,D)的小立方体。由于NumPy数组具有固定长度的轴,w必须均匀地划分w,对于h和d也是如此 然后有一种方法可以将立方体形状(W,H,D)重塑为新的形状数组(N,W,H,D) 例如,如果arr=np.arange(4*4*4)。重塑(4,4,4)(因此(W,H,D)=(4,4,4)),我们希望将其分解为形状为立方体的(2
np.split()
,可以沿一个轴拆分数组。我想知道是否有一个多轴版本,例如可以沿轴(0,1,2)拆分 假设立方体
具有形状(W,H,D)
,您希望将其分解为N
形状(W,H,D)
的小立方体。由于NumPy数组具有固定长度的轴,w
必须均匀地划分w
,对于h
和d
也是如此
然后有一种方法可以将立方体形状(W,H,D)
重塑为新的形状数组(N,W,H,D)
例如,如果arr=np.arange(4*4*4)。重塑(4,4,4)(因此(W,H,D)=(4,4,4)
),我们希望将其分解为形状为立方体的(2,2,2)
,那么我们可以使用
In [283]: arr.reshape(2,2,2,2,2,2).transpose(0,2,4,1,3,5).reshape(-1,2,2,2)
Out[283]:
array([[[[ 0, 1],
[ 4, 5]],
[[16, 17],
[20, 21]]],
...
[[[42, 43],
[46, 47]],
[[58, 59],
[62, 63]]]])
这里的想法是向数组中添加额外的轴,这些轴可以作为放置标记:
number of repeats act as placemarkers
o---o---o
| | |
v v v
(2,2,2,2,2,2)
^ ^ ^
| | |
o---o---o
newshape
然后我们可以对轴进行重新排序(使用转置
),以使重复次数排在第一位,新闻形状排在最后:
arr.reshape(2,2,2,2,2,2).transpose(0,2,4,1,3,5)
最后,调用重塑(-1,w,h,d)
将所有位置标记轴挤压成一个轴。这将生成一个形状数组(N,w,h,d)
,其中N
是小立方体的数量
上面使用的想法是对三维的概括。它可以进一步推广到任意维的nArray:
import numpy as np
def cubify(arr, newshape):
oldshape = np.array(arr.shape)
repeats = (oldshape / newshape).astype(int)
tmpshape = np.column_stack([repeats, newshape]).ravel()
order = np.arange(len(tmpshape))
order = np.concatenate([order[::2], order[1::2]])
# newshape must divide oldshape evenly or else ValueError will be raised
return arr.reshape(tmpshape).transpose(order).reshape(-1, *newshape)
print(cubify(np.arange(4*6*16).reshape(4,6,16), (2,3,4)).shape)
print(cubify(np.arange(8*8*8*8).reshape(8,8,8,8), (2,2,2,2)).shape)
产生新的形状数组
(16, 2, 3, 4)
(256, 2, 2, 2, 2)
要“取消绑定”阵列,请执行以下操作:
def uncubify(arr, oldshape):
N, newshape = arr.shape[0], arr.shape[1:]
oldshape = np.array(oldshape)
repeats = (oldshape / newshape).astype(int)
tmpshape = np.concatenate([repeats, newshape])
order = np.arange(len(tmpshape)).reshape(2, -1).ravel(order='F')
return arr.reshape(tmpshape).transpose(order).reshape(oldshape)
下面是一些测试代码,用于检查
cubify
和uncubify
是否为反向
import numpy as np
def cubify(arr, newshape):
oldshape = np.array(arr.shape)
repeats = (oldshape / newshape).astype(int)
tmpshape = np.column_stack([repeats, newshape]).ravel()
order = np.arange(len(tmpshape))
order = np.concatenate([order[::2], order[1::2]])
# newshape must divide oldshape evenly or else ValueError will be raised
return arr.reshape(tmpshape).transpose(order).reshape(-1, *newshape)
def uncubify(arr, oldshape):
N, newshape = arr.shape[0], arr.shape[1:]
oldshape = np.array(oldshape)
repeats = (oldshape / newshape).astype(int)
tmpshape = np.concatenate([repeats, newshape])
order = np.arange(len(tmpshape)).reshape(2, -1).ravel(order='F')
return arr.reshape(tmpshape).transpose(order).reshape(oldshape)
tests = [[np.arange(4*6*16), (4,6,16), (2,3,4)],
[np.arange(8*8*8*8), (8,8,8,8), (2,2,2,2)]]
for arr, oldshape, newshape in tests:
arr = arr.reshape(oldshape)
assert np.allclose(uncubify(cubify(arr, newshape), oldshape), arr)
# cuber = Cubify(oldshape,newshape)
# assert np.allclose(cuber.uncubify(cuber.cubify(arr)), arr)
我不认为有一个多轴版本,你可以沿着一些给定的轴分裂。但是你可以一次把它分成一个维度。例如:
def split2(arys, sections, axis=[0, 1]):
if not isinstance(arys, list):
arys = [arys]
for ax in axis:
arys = [np.split(ary, sections, axis=ax) for ary in arys]
arys = [ary for aa in arys for ary in aa] # Flatten
return arys
In [1]: a = np.array(range(100)).reshape(10, 10)
In [2]: split2(a, 2, axis=[0, 1])
Out[2]:
[array([[ 0, 1, 2, 3, 4],
[10, 11, 12, 13, 14],
[20, 21, 22, 23, 24],
[30, 31, 32, 33, 34],
[40, 41, 42, 43, 44]]),
array([[ 5, 6, 7, 8, 9],
[15, 16, 17, 18, 19],
[25, 26, 27, 28, 29],
[35, 36, 37, 38, 39],
[45, 46, 47, 48, 49]]),
array([[50, 51, 52, 53, 54],
[60, 61, 62, 63, 64],
[70, 71, 72, 73, 74],
[80, 81, 82, 83, 84],
[90, 91, 92, 93, 94]]),
array([[55, 56, 57, 58, 59],
[65, 66, 67, 68, 69],
[75, 76, 77, 78, 79],
[85, 86, 87, 88, 89],
[95, 96, 97, 98, 99]])]
它可以这样使用:
def split2(arys, sections, axis=[0, 1]):
if not isinstance(arys, list):
arys = [arys]
for ax in axis:
arys = [np.split(ary, sections, axis=ax) for ary in arys]
arys = [ary for aa in arys for ary in aa] # Flatten
return arys
In [1]: a = np.array(range(100)).reshape(10, 10)
In [2]: split2(a, 2, axis=[0, 1])
Out[2]:
[array([[ 0, 1, 2, 3, 4],
[10, 11, 12, 13, 14],
[20, 21, 22, 23, 24],
[30, 31, 32, 33, 34],
[40, 41, 42, 43, 44]]),
array([[ 5, 6, 7, 8, 9],
[15, 16, 17, 18, 19],
[25, 26, 27, 28, 29],
[35, 36, 37, 38, 39],
[45, 46, 47, 48, 49]]),
array([[50, 51, 52, 53, 54],
[60, 61, 62, 63, 64],
[70, 71, 72, 73, 74],
[80, 81, 82, 83, 84],
[90, 91, 92, 93, 94]]),
array([[55, 56, 57, 58, 59],
[65, 66, 67, 68, 69],
[75, 76, 77, 78, 79],
[85, 86, 87, 88, 89],
[95, 96, 97, 98, 99]])]
除了@unutbu回答的额外问题之外,我想我的回答正好相反(如果你想将一个立方体分割成多个立方体,对每个立方体应用一个函数,然后再将它们合并回来)
如果我想解开它,我需要什么样的转置-重塑组合才能把它带回大立方体?@mattdns:我在上面添加了一个
unbubify
函数。我们对reverseOrder
的定义不同(或者我在uncubify
函数中称之为order
)。对于高维数组,这会有所不同。将定义更改为self.reverseOrder=np.arange(len(self.tmpshape))。重塑(2,-1)。拉威尔(order='F')
将修复您的代码。回答得很好。我可以建议用:einops.重新排列(x'(xw)(yh)(zd)->(xyz)whd',w=2,h=2,d=2)
和einops.重新排列(x'(xyz)whd->(xw)(yh)(zd)
?