Python 子集NetCDF文件并返回元组_Python_Python 3.x_Netcdf

Python 子集NetCDF文件并返回元组

python python-3.x

Python 子集NetCDF文件并返回元组,python,python-3.x,netcdf,Python,Python 3.x,Netcdf,我有一个大的（5GB）温度netCDF文件。该文件有4个维度：时间、压力级别、纬度和经度数据集有31个时间点，我只对5个压力级别感兴趣我的参数是温度t： from netCDF4._netCDF4 import Dataset # Load the dataset dataset = Dataset(path) factor = dataset.variables['t'] 要从中心单元格周围的因子变量中提取温度数据的“立方体”，我只需进行子集设置，如下所示： radius = 5 #

我有一个大的（5GB）温度netCDF文件。该文件有4个维度：时间、压力级别、纬度和经度

数据集有31个时间点，我只对5个压力级别感兴趣

我的参数是温度

：

from netCDF4._netCDF4 import Dataset
# Load the dataset
dataset = Dataset(path)
factor = dataset.variables['t']

要从中心单元格周围的

因子

变量中提取温度数据的“立方体”，我只需进行子集设置，如下所示：

radius = 5 
# +1 because the subsetting does not include last index
lats_bounds = [nearest_latitude_index-radius,nearest_latitude_index+radius + 1] 
lons_bounds = [nearest_longitude_index-radius,nearest_longitude_index+radius +1]

#all timepoints
times_bounds = [0, len(times)] 

#just the last 5 pressure levels
pressure_level_bounds = [len(levels)-5, len(levels)] 

results = factor[times_bounds[0]:times_bounds[1],pressure_level_bounds[0]:pressure_level_bounds[1], lats_bounds[0]:lats_bounds[1],lons_bounds[0]:lons_bounds[1]]

# corresponding timestamp, pressure level, latitude, longitude
# and the temperature value extracted.
final = [
(2342342, 1000, 24.532, 53.531, 277),
(2342342, 1000, 74.453, 26.123, 351),
(2342342, 1000, 80.311, 56,345, 131),
...
]

ds = xr.open_dataset(path)
factor = ds['t']

# note that levels/lon/lat are the names of dimensions in your Dataset
subset = factor.isel(levels=slice(-5, None),
                     lon=[1, 18, 48, 99], lat=[16, 28, 33, 35])
stacked = subset.stack(points=('time', 'levels', 'lon', 'lat'))

# This subset can be converted to a `pandas.Series`:
data = stacked.to_pandas()

# or it can be converted to a list of tuples
df = data.reset_index()
final = [tuple(row[1].values) for row in df.iterrows()]

问题是

结果

现在将是

ndarray

类型，形状

（31,5,11,11）

和大小

（31*5*11*11），其中每个索引只包含一个值

我需要

结果中的值，但对于每个值，我还需要其相应的时间点、压力级别、纬度和经度
理想情况下，我希望像以前一样进行子集设置，但我的最终结果将是一个元组数组。。。大概是这样的：
radius = 5 
# +1 because the subsetting does not include last index
lats_bounds = [nearest_latitude_index-radius,nearest_latitude_index+radius + 1] 
lons_bounds = [nearest_longitude_index-radius,nearest_longitude_index+radius +1]

#all timepoints
times_bounds = [0, len(times)] 

#just the last 5 pressure levels
pressure_level_bounds = [len(levels)-5, len(levels)] 

results = factor[times_bounds[0]:times_bounds[1],pressure_level_bounds[0]:pressure_level_bounds[1], lats_bounds[0]:lats_bounds[1],lons_bounds[0]:lons_bounds[1]]

# corresponding timestamp, pressure level, latitude, longitude
# and the temperature value extracted.
final = [
(2342342, 1000, 24.532, 53.531, 277),
(2342342, 1000, 74.453, 26.123, 351),
(2342342, 1000, 80.311, 56,345, 131),
...
]

ds = xr.open_dataset(path)
factor = ds['t']

# note that levels/lon/lat are the names of dimensions in your Dataset
subset = factor.isel(levels=slice(-5, None),
                     lon=[1, 18, 48, 99], lat=[16, 28, 33, 35])
stacked = subset.stack(points=('time', 'levels', 'lon', 'lat'))

# This subset can be converted to a `pandas.Series`:
data = stacked.to_pandas()

# or it can be converted to a list of tuples
df = data.reset_index()
final = [tuple(row[1].values) for row in df.iterrows()]

我怎样才能做到这一点？
我会用它来完成这项任务。但是，因为你只有35次和5个压力等级，我首先要简化你的方法，找出如何做一个单一的时间和压力等级，以及一个单一的lat，lon。然后找出如何循环通过这些索引来获得元组。比如：
for i in range(0, len(times)):
   for j in range(0, len(levels):
     print( results[i, j, nearest_lat_idx, nearest_lon_idx) )

当然，您也可以为lat和lon添加循环，但这有点难看。
请查看xarray的。将语法从netCDF4转换为如下内容：
radius = 5 
# +1 because the subsetting does not include last index
lats_bounds = [nearest_latitude_index-radius,nearest_latitude_index+radius + 1] 
lons_bounds = [nearest_longitude_index-radius,nearest_longitude_index+radius +1]

#all timepoints
times_bounds = [0, len(times)] 

#just the last 5 pressure levels
pressure_level_bounds = [len(levels)-5, len(levels)] 

results = factor[times_bounds[0]:times_bounds[1],pressure_level_bounds[0]:pressure_level_bounds[1], lats_bounds[0]:lats_bounds[1],lons_bounds[0]:lons_bounds[1]]

# corresponding timestamp, pressure level, latitude, longitude
# and the temperature value extracted.
final = [
(2342342, 1000, 24.532, 53.531, 277),
(2342342, 1000, 74.453, 26.123, 351),
(2342342, 1000, 80.311, 56,345, 131),
...
]

ds = xr.open_dataset(path)
factor = ds['t']

# note that levels/lon/lat are the names of dimensions in your Dataset
subset = factor.isel(levels=slice(-5, None),
                     lon=[1, 18, 48, 99], lat=[16, 28, 33, 35])
stacked = subset.stack(points=('time', 'levels', 'lon', 'lat'))

# This subset can be converted to a `pandas.Series`:
data = stacked.to_pandas()

# or it can be converted to a list of tuples
df = data.reset_index()
final = [tuple(row[1].values) for row in df.iterrows()]

Xarray还支持基于标签的索引器（即lat=[29.3,42.3]
），但为此，您应该使用sel
方法，而不是isel
 谢谢！我注意到您的factor.isel（）
中没有包含时间。这是故意的吗？我需要使用所有的时间点，但将来我可能只需要一个子集。时间点采用unix时间戳格式。如果要沿时间轴切片，则可以添加time=slice（start，end）
，但对于当前用例，这将为您提供所有时间步。