Python 子集NetCDF文件并返回元组
我有一个大的(5GB)温度netCDF文件。该文件有4个维度:时间、压力级别、纬度和经度 数据集有31个时间点,我只对5个压力级别感兴趣 我的参数是温度Python 子集NetCDF文件并返回元组,python,python-3.x,netcdf,Python,Python 3.x,Netcdf,我有一个大的(5GB)温度netCDF文件。该文件有4个维度:时间、压力级别、纬度和经度 数据集有31个时间点,我只对5个压力级别感兴趣 我的参数是温度t: from netCDF4._netCDF4 import Dataset # Load the dataset dataset = Dataset(path) factor = dataset.variables['t'] 要从中心单元格周围的因子变量中提取温度数据的“立方体”,我只需进行子集设置,如下所示: radius = 5 #
t
:
from netCDF4._netCDF4 import Dataset
# Load the dataset
dataset = Dataset(path)
factor = dataset.variables['t']
要从中心单元格周围的因子
变量中提取温度数据的“立方体”,我只需进行子集设置,如下所示:
radius = 5
# +1 because the subsetting does not include last index
lats_bounds = [nearest_latitude_index-radius,nearest_latitude_index+radius + 1]
lons_bounds = [nearest_longitude_index-radius,nearest_longitude_index+radius +1]
#all timepoints
times_bounds = [0, len(times)]
#just the last 5 pressure levels
pressure_level_bounds = [len(levels)-5, len(levels)]
results = factor[times_bounds[0]:times_bounds[1],pressure_level_bounds[0]:pressure_level_bounds[1], lats_bounds[0]:lats_bounds[1],lons_bounds[0]:lons_bounds[1]]
# corresponding timestamp, pressure level, latitude, longitude
# and the temperature value extracted.
final = [
(2342342, 1000, 24.532, 53.531, 277),
(2342342, 1000, 74.453, 26.123, 351),
(2342342, 1000, 80.311, 56,345, 131),
...
]
ds = xr.open_dataset(path)
factor = ds['t']
# note that levels/lon/lat are the names of dimensions in your Dataset
subset = factor.isel(levels=slice(-5, None),
lon=[1, 18, 48, 99], lat=[16, 28, 33, 35])
stacked = subset.stack(points=('time', 'levels', 'lon', 'lat'))
# This subset can be converted to a `pandas.Series`:
data = stacked.to_pandas()
# or it can be converted to a list of tuples
df = data.reset_index()
final = [tuple(row[1].values) for row in df.iterrows()]
问题是结果
现在将是ndarray
类型,形状(31,5,11,11)
和大小18755
(31*5*11*11),其中每个索引只包含一个值
我需要结果中的值,但对于每个值,我还需要其相应的时间点、压力级别、纬度和经度
理想情况下,我希望像以前一样进行子集设置,但我的最终结果将是一个元组数组。。。大概是这样的:
radius = 5
# +1 because the subsetting does not include last index
lats_bounds = [nearest_latitude_index-radius,nearest_latitude_index+radius + 1]
lons_bounds = [nearest_longitude_index-radius,nearest_longitude_index+radius +1]
#all timepoints
times_bounds = [0, len(times)]
#just the last 5 pressure levels
pressure_level_bounds = [len(levels)-5, len(levels)]
results = factor[times_bounds[0]:times_bounds[1],pressure_level_bounds[0]:pressure_level_bounds[1], lats_bounds[0]:lats_bounds[1],lons_bounds[0]:lons_bounds[1]]
# corresponding timestamp, pressure level, latitude, longitude
# and the temperature value extracted.
final = [
(2342342, 1000, 24.532, 53.531, 277),
(2342342, 1000, 74.453, 26.123, 351),
(2342342, 1000, 80.311, 56,345, 131),
...
]
ds = xr.open_dataset(path)
factor = ds['t']
# note that levels/lon/lat are the names of dimensions in your Dataset
subset = factor.isel(levels=slice(-5, None),
lon=[1, 18, 48, 99], lat=[16, 28, 33, 35])
stacked = subset.stack(points=('time', 'levels', 'lon', 'lat'))
# This subset can be converted to a `pandas.Series`:
data = stacked.to_pandas()
# or it can be converted to a list of tuples
df = data.reset_index()
final = [tuple(row[1].values) for row in df.iterrows()]
我怎样才能做到这一点?我会用它来完成这项任务。但是,因为你只有35次和5个压力等级,我首先要简化你的方法,找出如何做一个单一的时间和压力等级,以及一个单一的lat,lon。然后找出如何循环通过这些索引来获得元组。比如:
for i in range(0, len(times)):
for j in range(0, len(levels):
print( results[i, j, nearest_lat_idx, nearest_lon_idx) )
当然,您也可以为lat和lon添加循环,但这有点难看。请查看xarray的。将语法从netCDF4转换为如下内容:
radius = 5
# +1 because the subsetting does not include last index
lats_bounds = [nearest_latitude_index-radius,nearest_latitude_index+radius + 1]
lons_bounds = [nearest_longitude_index-radius,nearest_longitude_index+radius +1]
#all timepoints
times_bounds = [0, len(times)]
#just the last 5 pressure levels
pressure_level_bounds = [len(levels)-5, len(levels)]
results = factor[times_bounds[0]:times_bounds[1],pressure_level_bounds[0]:pressure_level_bounds[1], lats_bounds[0]:lats_bounds[1],lons_bounds[0]:lons_bounds[1]]
# corresponding timestamp, pressure level, latitude, longitude
# and the temperature value extracted.
final = [
(2342342, 1000, 24.532, 53.531, 277),
(2342342, 1000, 74.453, 26.123, 351),
(2342342, 1000, 80.311, 56,345, 131),
...
]
ds = xr.open_dataset(path)
factor = ds['t']
# note that levels/lon/lat are the names of dimensions in your Dataset
subset = factor.isel(levels=slice(-5, None),
lon=[1, 18, 48, 99], lat=[16, 28, 33, 35])
stacked = subset.stack(points=('time', 'levels', 'lon', 'lat'))
# This subset can be converted to a `pandas.Series`:
data = stacked.to_pandas()
# or it can be converted to a list of tuples
df = data.reset_index()
final = [tuple(row[1].values) for row in df.iterrows()]
Xarray还支持基于标签的索引器(即lat=[29.3,42.3]
),但为此,您应该使用sel
方法,而不是isel
谢谢!我注意到您的factor.isel()
中没有包含时间。这是故意的吗?我需要使用所有的时间点,但将来我可能只需要一个子集。时间点采用unix时间戳格式。如果要沿时间轴切片,则可以添加time=slice(start,end)
,但对于当前用例,这将为您提供所有时间步。