Python 如何使用groupby和apply为xarray数据集添加新变量？_Python_Netcdf_Python Xarray

Python 如何使用groupby和apply为xarray数据集添加新变量？

python

Python 如何使用groupby和apply为xarray数据集添加新变量？,python,netcdf,python-xarray,Python,Netcdf,Python Xarray,我在理解xarray.groupby是如何工作的方面面临着严重的困难。我试图对xarray DatasetGroupBy集合的每组应用给定函数“f”，这样“f”应该向原始xr.DataSet的每个应用组添加新变量以下是简要介绍：我的问题常见于地球科学、遥感等领域其目的是逐像素（或网格单元逐网格单元）在阵列上应用给定函数例子让我们假设我想要评估给定区域相对于新方向的风速分量（u，v）。所以，我需要评估“u”和“v”组件的旋转版本，即：u_旋转和v_旋转假设这个新方向相对于风场中的每个

我在理解xarray.groupby是如何工作的方面面临着严重的困难。我试图对xarray DatasetGroupBy集合的每组应用给定函数“f”，这样“f”应该向原始xr.DataSet的每个应用组添加新变量

以下是简要介绍：我的问题常见于地球科学、遥感等领域

其目的是逐像素（或网格单元逐网格单元）在阵列上应用给定函数

例子让我们假设我想要评估给定区域相对于新方向的风速分量（u，v）。所以，我需要评估“u”和“v”组件的旋转版本，即：u_旋转和v_旋转

假设这个新方向相对于风场中的每个像素位置逆时针旋转30°。因此，新的风分量将是（u_30_度和v_30_度）

我的第一次尝试是将每个x和y坐标（或经度和纬度）叠加到一个称为pixel的新维度，然后按这个新维度（“pixel”）分组，并应用一个函数来进行矢量风旋转

以下是我最初尝试的一个片段：

# First, let's create some functions for vector rotation:

def rotate_2D_vector_per_given_degrees(array2D, angle=30):
    '''
        
    
        Parameters
        ----------
        array2D : 1D length 2 numpy array
            
        angle : float angle in degrees (optional)
            DESCRIPTION. The default is 30.
    
        Returns
        -------
        Rotated_2D_Vector : 1D of length 2 numpy array
            

    '''
        
    R = get_rotation_matrix(rotation = angle)
        
    
    Rotated_2D_Vector = np.dot(R, array2D)
    
    return Rotated_2D_Vector

def get_rotation_matrix(rotation=90):
    '''
    Description:
    
        This function creates a rotation matrix given a defined rotation angle (in degrees)
    
    Parameters:
        rotation: in degrees
    
    Returns:
        rotation matrix
    '''
    
    theta = np.radians(rotation) # degrees
    c, s = np.cos(theta), np.sin(theta)
    R = np.array(((c, -s), (s, c)))
    return R
    


# Then let's create a reproducible dataset for analysis:

u_wind = xr.DataArray(np.ones( shape=(20, 30)),
                     dims=('x', 'y'),
                     coords={'x': np.arange(0, 20),
                             'y': np.arange(0, 30)},
                     name='u')


v_wind = xr.DataArray(np.ones( shape=(20, 30))*0.3,
                     dims=('x', 'y'),
                     coords={'x': np.arange(0, 20),
                             'y': np.arange(0, 30)},
                     name='v')
 
data = xr.merge([u_wind, v_wind])


# Let's create the given function that will be applied per each group in the dataset:



def rotate_wind(array, degrees=30):
    
    # This next line, I create a 1-dimension vector of length 2, 
    # with wind speed of the u and v components, respectively.

    # The best solution I found has been conver the dataset into a single xr.DataArray
    # by stacking the 'u' and 'v' components into a single variable named 'wind'.

    vector = array.to_array(dim='wind').values
    
    # Now, I rotate the wind vector given a rotation angle in degrees

    Rotated = rotate_2D_vector_per_given_degrees(vector, degrees)
    
    # Ensuring numerical division problems as 1e-17  == 0.
    Rotated = np.where( np.abs(Rotated - 6.123234e-15) < 1e-15, 0, Rotated)
    
    # sanity check for each point position:

    print('Coords: ', array['point'].values, 
          'Wind Speed: ', vector, 
          'Response :', Rotated, 
          end='\n\n'+'-'*20+'\n')
    
    components = [a for a in data.variables if a not in data.dims]
    
    for dim, value in zip(components, Rotated):
        
        array['{0}_rotated_{1}'.format(dim, degrees)] = value
        
    return array



# Finally, lets stack our dataset per grid-point, groupby this new dimension, and apply the desired function:

stacked = data.stack(point = ['x', 'y'])

stacked = stacked.groupby('point').apply(rotate_wind)

# lets unstack the data to recover the original dataset:

data = stacked.unstack('point')

# Let's check if the function worked correctly
data.to_dataframe().head(30)

#首先，让我们为向量旋转创建一些函数：
def按给定角度旋转2D矢量（阵列2D，角度=30）：
'''
参数
----------
array2D:1D长度2 numpy数组
角度：以度为单位的浮动角度（可选）
描述默认值为30。
退换商品
-------
旋转的_2D_向量：长度为2 numpy数组的1D
'''
R=获取旋转矩阵（旋转=角度）
旋转的二维矢量=np点（R，array2D）
返回旋转的二维向量
def get_旋转矩阵（旋转=90）：
'''
说明：
此函数用于创建给定定义旋转角度（以度为单位）的旋转矩阵
参数：
旋转：以度为单位
返回：
旋转矩阵
'''
θ=np.弧度（旋转）#度
c、 s=np.cos（θ），np.sin（θ）
R=np.数组（（（c，-s），（s，c）））
返回R
#然后，让我们创建一个可复制的数据集进行分析：
u_wind=xr.DataArray（np.one（shape=（20,30）），
dims=（'x'，'y'），
coords={'x'：np.arange（0,20），
'y'：np.arange（0,30）}，
name='u'）
v_wind=xr.DataArray（np.one（shape=（20,30））*0.3，
dims=（'x'，'y'），
coords={'x'：np.arange（0,20），
'y'：np.arange（0,30）}，
name='v'）
data=xr.merge（[u风，v风]）
#让我们创建将应用于数据集中每个组的给定函数：
def旋转_风（阵列，度=30）：
#下一行，我创建一个长度为2的一维向量，
#分别使用u和v分量的风速。
#我找到的最佳解决方案是将数据集转换为单个xr.DataArray
#通过将“u”和“v”分量叠加到一个名为“wind”的变量中。
向量=数组。到数组（dim='wind'）。值
#现在，我旋转给定旋转角度的风矢量，单位为度
旋转=每给定度（向量，度）旋转二维向量
#确保数字除法问题为1e-17==0。
旋转=np.式中（np.abs（旋转-6.123234e-15）<1e-15，0，旋转）
#每个点位置的完整性检查：
打印（'Coords:'，数组['point']。值，
“风速：”，矢量，
“响应：”，旋转，
end='\n\n'+'-'*20+'\n'）
components=[a代表data.variables中的a，如果a不在data.dims中]
对于dim，以zip表示的值（组件，旋转）：
数组['{0}{1}'。格式（dim，度）]=值
返回数组
#最后，让我们按网格点堆叠数据集，按此新维度分组，并应用所需的函数：
堆叠=数据。堆栈（点=['x'，'y']）
堆叠=堆叠。分组方式（“点”）。应用（旋转风）
#允许取消堆叠数据以恢复原始数据集：
数据=堆叠。取消堆叠（'点'）
#让我们检查一下函数是否正常工作
data.to_dataframe（）头（30）

尽管上面的示例显然有效，但我仍然不确定其结果是否正确，或者即使groupby apply函数实现是否高效（干净、无冗余、快速等）

欢迎有任何见解

诚恳地说，

您只需将整个数组乘以旋转矩阵，类似于

np.dot（R，da）

因此，如果您有以下

数据集

：

>>> dims = ("x", "y")
>>> sizes = (20, 30)

>>> ds = xr.Dataset(
        data_vars=dict(u=(dims, np.ones(sizes)), v=(dims, np.ones(sizes) * 0.3)),
        coords={d: np.arange(s) for d, s in zip(dims, sizes)},
    )
>>> ds
<xarray.Dataset>
Dimensions:  (x: 20, y: 30)
Coordinates:
  * x        (x) int64 0 1 2 3 4 ... 16 17 18 19
  * y        (y) int64 0 1 2 3 4 ... 26 27 28 29
Data variables:
    u        (x, y) float64 1.0 1.0 ... 1.0 1.0
    v        (x, y) float64 0.3 0.3 ... 0.3 0.3

然后，由于

np.dot（R，da）

，您得到了旋转的值：

然后像这样使用它：

>>> da_rotated = rotate(da, dim="wind", angle=30)
>>> da_rotated
<xarray.DataArray (wind: 2, point: 600)>
array([[0.7160254 , 0.7160254 , 0.7160254 , ..., 0.7160254 , 0.7160254 ,
        0.7160254 ],
       [0.75980762, 0.75980762, 0.75980762, ..., 0.75980762, 0.75980762,
        0.75980762]])
Coordinates:
  * point    (point) MultiIndex
  - x        (point) int64 0 0 0 0 ... 19 19 19 19
  - y        (point) int64 0 1 2 3 ... 26 27 28 29
  * wind     (wind) <U12 'u_rotated_30' 'v_rotated_30'

谢谢你的回复。信息量最大。我会跟踪你提出的改变。

>>> np.dot(R, da).shape
(2, 600)

>>> type(np.dot(R, da))
<class 'numpy.ndarray'>

def rotate(da, dim, angle):

    # Put dim first
    dims_orig = da.dims
    da = da.transpose(dim, ...)

    # Rotate
    R = rotation_matrix(angle)
    rotated = da.copy(data=np.dot(R, da), deep=True)

    # Rename values of "dim" coord according to rotation
    rotated[dim] = [f"{orig}_rotated_{angle}" for orig in da[dim].values]

    # Transpose back to orig
    return rotated.transpose(*dims_orig)

>>> da_rotated = rotate(da, dim="wind", angle=30)
>>> da_rotated
<xarray.DataArray (wind: 2, point: 600)>
array([[0.7160254 , 0.7160254 , 0.7160254 , ..., 0.7160254 , 0.7160254 ,
        0.7160254 ],
       [0.75980762, 0.75980762, 0.75980762, ..., 0.75980762, 0.75980762,
        0.75980762]])
Coordinates:
  * point    (point) MultiIndex
  - x        (point) int64 0 0 0 0 ... 19 19 19 19
  - y        (point) int64 0 1 2 3 ... 26 27 28 29
  * wind     (wind) <U12 'u_rotated_30' 'v_rotated_30'

>>> ds_rotated = da_rotated.to_dataset(dim="wind").unstack(dim="point")
>>> ds_rotated
<xarray.Dataset>
Dimensions:       (x: 20, y: 30)
Coordinates:
  * x             (x) int64 0 1 2 3 ... 17 18 19
  * y             (y) int64 0 1 2 3 ... 27 28 29
Data variables:
    u_rotated_30  (x, y) float64 0.716 ... 0.716
    v_rotated_30  (x, y) float64 0.7598 ... 0.7598