Python np.reformate的xarray等价_Python_Numpy_Python Xarray

Python np.reformate的xarray等价

python numpy

Python np.reformate的xarray等价,python,numpy,python-xarray,Python,Numpy,Python Xarray,我有一个3d阵列（10x10x3），出于某种原因，它被保存为2d xr.DataArray（100x3）。看起来有点像这样： data = xr.DataArray(np.random.randn(100, 3), dims=('ct', 'x'), coords={'ct': range(100)}) c = [x%10 for x in range(100)] t = [1234+x//10 for x in

我有一个3d阵列（10x10x3），出于某种原因，它被保存为2d xr.DataArray（100x3）。看起来有点像这样：

data = xr.DataArray(np.random.randn(100, 3),
                    dims=('ct', 'x'),
                    coords={'ct': range(100)})

c = [x%10 for x in range(100)]
t = [1234+x//10 for x in range(100)]

c和t是ct中捆绑在一起的坐标

在过去，我已经解决了如下分离二维的问题：

t_x_c,x = data.shape
nc = 10
data = np.reshape(data.values,(t_x_c//nc,nc, x))

但这需要数据结构中的一些假设，这些假设在不久的将来可能不成立（例如，c和t可能不像我的示例中那样规则）

我已成功将c和t指定为数组的附加坐标：

data2 = data.assign_coords(
    coords={"c": ("ct", c),
            "t": ("ct", t),
},)

但我想把它们提升到数组的维度。我将如何做到这一点？

您希望结合使用和方法

我们分手吧

首先，我在“c”和“t”已经是坐标的阶段创建虚拟数组：

c, t = [arr.flatten() for arr in np.meshgrid(range(10), range(1234, 1234+10))]

da = xr.DataArray( 
    np.random.randn(100, 3), 
    dims=('ct', 'x'), 
    coords={ 
        'c': ('ct', c), 
        't': ('ct', t) 
    }
)

>>> da.set_index(ct=("c", "t"))                                                                  
<xarray.DataArray (ct: 100, x: 3)>
[...]
Coordinates:
  * ct       (ct) MultiIndex
  - c        (ct) int64 0 1 2 3 4 5 6 7 8 9 0 1 2 ... 
  - t        (ct) int64 1234 1234 1234 1234 1234 ...
Dimensions without coordinates: x

然后，使用

set_index（）

创建一个组合“c”和“t”坐标的

MultiIndex

：

c, t = [arr.flatten() for arr in np.meshgrid(range(10), range(1234, 1234+10))]

da = xr.DataArray( 
    np.random.randn(100, 3), 
    dims=('ct', 'x'), 
    coords={ 
        'c': ('ct', c), 
        't': ('ct', t) 
    }
)

>>> da.set_index(ct=("c", "t"))                                                                  
<xarray.DataArray (ct: 100, x: 3)>
[...]
Coordinates:
  * ct       (ct) MultiIndex
  - c        (ct) int64 0 1 2 3 4 5 6 7 8 9 0 1 2 ... 
  - t        (ct) int64 1234 1234 1234 1234 1234 ...
Dimensions without coordinates: x

但是，正如您所看到的，

.unstack（）

将未堆叠的维度放在最后。因此，您可能最终想要转置：

>>> da.set_index(ct=("c", "t")).unstack("ct").transpose("c", "t", "x").dims                      
('c', 't', 'x')

您希望使用和方法的组合

我们分手吧

首先，我在“c”和“t”已经是坐标的阶段创建虚拟数组：

c, t = [arr.flatten() for arr in np.meshgrid(range(10), range(1234, 1234+10))]

da = xr.DataArray( 
    np.random.randn(100, 3), 
    dims=('ct', 'x'), 
    coords={ 
        'c': ('ct', c), 
        't': ('ct', t) 
    }
)

>>> da.set_index(ct=("c", "t"))                                                                  
<xarray.DataArray (ct: 100, x: 3)>
[...]
Coordinates:
  * ct       (ct) MultiIndex
  - c        (ct) int64 0 1 2 3 4 5 6 7 8 9 0 1 2 ... 
  - t        (ct) int64 1234 1234 1234 1234 1234 ...
Dimensions without coordinates: x

然后，使用

set_index（）

创建一个组合“c”和“t”坐标的

MultiIndex

：

c, t = [arr.flatten() for arr in np.meshgrid(range(10), range(1234, 1234+10))]

da = xr.DataArray( 
    np.random.randn(100, 3), 
    dims=('ct', 'x'), 
    coords={ 
        'c': ('ct', c), 
        't': ('ct', t) 
    }
)

>>> da.set_index(ct=("c", "t"))                                                                  
<xarray.DataArray (ct: 100, x: 3)>
[...]
Coordinates:
  * ct       (ct) MultiIndex
  - c        (ct) int64 0 1 2 3 4 5 6 7 8 9 0 1 2 ... 
  - t        (ct) int64 1234 1234 1234 1234 1234 ...
Dimensions without coordinates: x

但是，正如您所看到的，

.unstack（）

将未堆叠的维度放在最后。因此，您可能最终想要转置：

>>> da.set_index(ct=("c", "t")).unstack("ct").transpose("c", "t", "x").dims                      
('c', 't', 'x')

一种替代方法是在开始时用shape

生成

和

坐标，并从这里创建一个多索引，但是，这应该不是必需的。仅为

和

提供所需的坐标值（在这种情况下，长度分别为10和10）就足够了。这个答案将在其他SO答案和GitHub问题中提供两个已经可用的替代方案。答案中包含了相关代码，但有关实现的详细信息，请咨询原始源代码

中的答案给出了使用纯xarray方法进行重塑的示例，代码如下：

reshaped_ds = ds.assign_coords(
    c=np.arange(10), t=np.arange(1234, 1244)
).stack(
    aux_dim=("c", "t")
).reset_index(
    "ct", drop=True
).rename(
    ct="aux_dim"
).unstack("aux_dim")

请注意，这仅适用于数据集，因此需要

ds=data.to\u数据集（name=“aux\u name”）

。重塑后，可以使用

ds.aux\u name

再次将其转换为DataArray

另一种方法是使用pandas生成多索引，而不是让xarray使用

assign\u coords

stack

创建多索引，如下所示。这种替代方案是为数据阵列量身定制的，它甚至集成了转置，以确保重新成形的维度保持原始顺序。为完整起见，以下是上述问题中提出的重塑数据阵列的代码：

def xr_reshape(A, dim, newdims, coords):
    """ Reshape DataArray A to convert its dimension dim into sub-dimensions given by
    newdims and the corresponding coords.
    Example: Ar = xr_reshape(A, 'time', ['year', 'month'], [(2017, 2018), np.arange(12)]) """


    # Create a pandas MultiIndex from these labels
    ind = pd.MultiIndex.from_product(coords, names=newdims)

    # Replace the time index in the DataArray by this new index,
    A1 = A.copy()

    A1.coords[dim] = ind

    # Convert multiindex to individual dims using DataArray.unstack().
    # This changes dimension order! The new dimensions are at the end.
    A1 = A1.unstack(dim)

    # Permute to restore dimensions
    i = A.dims.index(dim)
    dims = list(A1.dims)

    for d in newdims[::-1]:
        dims.insert(i, d)

    for d in newdims:
        _ = dims.pop(-1)


    return A1.transpose(*dims)