Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/321.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/17.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 子集NetCDF文件并返回元组_Python_Python 3.x_Netcdf - Fatal编程技术网

Python 子集NetCDF文件并返回元组

Python 子集NetCDF文件并返回元组,python,python-3.x,netcdf,Python,Python 3.x,Netcdf,我有一个大的(5GB)温度netCDF文件。该文件有4个维度:时间、压力级别、纬度和经度 数据集有31个时间点,我只对5个压力级别感兴趣 我的参数是温度t: from netCDF4._netCDF4 import Dataset # Load the dataset dataset = Dataset(path) factor = dataset.variables['t'] 要从中心单元格周围的因子变量中提取温度数据的“立方体”,我只需进行子集设置,如下所示: radius = 5 #

我有一个大的(5GB)温度netCDF文件。该文件有4个维度:时间、压力级别、纬度和经度

数据集有31个时间点,我只对5个压力级别感兴趣

我的参数是温度
t

from netCDF4._netCDF4 import Dataset
# Load the dataset
dataset = Dataset(path)
factor = dataset.variables['t']
要从中心单元格周围的
因子
变量中提取温度数据的“立方体”,我只需进行子集设置,如下所示:

radius = 5 
# +1 because the subsetting does not include last index
lats_bounds = [nearest_latitude_index-radius,nearest_latitude_index+radius + 1] 
lons_bounds = [nearest_longitude_index-radius,nearest_longitude_index+radius +1]

#all timepoints
times_bounds = [0, len(times)] 

#just the last 5 pressure levels
pressure_level_bounds = [len(levels)-5, len(levels)] 

results = factor[times_bounds[0]:times_bounds[1],pressure_level_bounds[0]:pressure_level_bounds[1], lats_bounds[0]:lats_bounds[1],lons_bounds[0]:lons_bounds[1]]
# corresponding timestamp, pressure level, latitude, longitude
# and the temperature value extracted.
final = [
(2342342, 1000, 24.532, 53.531, 277),
(2342342, 1000, 74.453, 26.123, 351),
(2342342, 1000, 80.311, 56,345, 131),
...
]
ds = xr.open_dataset(path)
factor = ds['t']

# note that levels/lon/lat are the names of dimensions in your Dataset
subset = factor.isel(levels=slice(-5, None),
                     lon=[1, 18, 48, 99], lat=[16, 28, 33, 35])
stacked = subset.stack(points=('time', 'levels', 'lon', 'lat'))

# This subset can be converted to a `pandas.Series`:
data = stacked.to_pandas()

# or it can be converted to a list of tuples
df = data.reset_index()
final = [tuple(row[1].values) for row in df.iterrows()]
问题是
结果
现在将是
ndarray
类型,形状
(31,5,11,11)
和大小
18755
(31*5*11*11),其中每个索引只包含一个值

我需要
结果中的值,但对于每个值,我还需要其相应的时间点、压力级别、纬度和经度

理想情况下,我希望像以前一样进行子集设置,但我的最终结果将是一个元组数组。。。大概是这样的:

radius = 5 
# +1 because the subsetting does not include last index
lats_bounds = [nearest_latitude_index-radius,nearest_latitude_index+radius + 1] 
lons_bounds = [nearest_longitude_index-radius,nearest_longitude_index+radius +1]

#all timepoints
times_bounds = [0, len(times)] 

#just the last 5 pressure levels
pressure_level_bounds = [len(levels)-5, len(levels)] 

results = factor[times_bounds[0]:times_bounds[1],pressure_level_bounds[0]:pressure_level_bounds[1], lats_bounds[0]:lats_bounds[1],lons_bounds[0]:lons_bounds[1]]
# corresponding timestamp, pressure level, latitude, longitude
# and the temperature value extracted.
final = [
(2342342, 1000, 24.532, 53.531, 277),
(2342342, 1000, 74.453, 26.123, 351),
(2342342, 1000, 80.311, 56,345, 131),
...
]
ds = xr.open_dataset(path)
factor = ds['t']

# note that levels/lon/lat are the names of dimensions in your Dataset
subset = factor.isel(levels=slice(-5, None),
                     lon=[1, 18, 48, 99], lat=[16, 28, 33, 35])
stacked = subset.stack(points=('time', 'levels', 'lon', 'lat'))

# This subset can be converted to a `pandas.Series`:
data = stacked.to_pandas()

# or it can be converted to a list of tuples
df = data.reset_index()
final = [tuple(row[1].values) for row in df.iterrows()]
我怎样才能做到这一点?

我会用它来完成这项任务。但是,因为你只有35次和5个压力等级,我首先要简化你的方法,找出如何做一个单一的时间和压力等级,以及一个单一的lat,lon。然后找出如何循环通过这些索引来获得元组。比如:

for i in range(0, len(times)):
   for j in range(0, len(levels):
     print( results[i, j, nearest_lat_idx, nearest_lon_idx) )
当然,您也可以为lat和lon添加循环,但这有点难看。

请查看xarray的。将语法从netCDF4转换为如下内容:

radius = 5 
# +1 because the subsetting does not include last index
lats_bounds = [nearest_latitude_index-radius,nearest_latitude_index+radius + 1] 
lons_bounds = [nearest_longitude_index-radius,nearest_longitude_index+radius +1]

#all timepoints
times_bounds = [0, len(times)] 

#just the last 5 pressure levels
pressure_level_bounds = [len(levels)-5, len(levels)] 

results = factor[times_bounds[0]:times_bounds[1],pressure_level_bounds[0]:pressure_level_bounds[1], lats_bounds[0]:lats_bounds[1],lons_bounds[0]:lons_bounds[1]]
# corresponding timestamp, pressure level, latitude, longitude
# and the temperature value extracted.
final = [
(2342342, 1000, 24.532, 53.531, 277),
(2342342, 1000, 74.453, 26.123, 351),
(2342342, 1000, 80.311, 56,345, 131),
...
]
ds = xr.open_dataset(path)
factor = ds['t']

# note that levels/lon/lat are the names of dimensions in your Dataset
subset = factor.isel(levels=slice(-5, None),
                     lon=[1, 18, 48, 99], lat=[16, 28, 33, 35])
stacked = subset.stack(points=('time', 'levels', 'lon', 'lat'))

# This subset can be converted to a `pandas.Series`:
data = stacked.to_pandas()

# or it can be converted to a list of tuples
df = data.reset_index()
final = [tuple(row[1].values) for row in df.iterrows()]

Xarray还支持基于标签的索引器(即
lat=[29.3,42.3]
),但为此,您应该使用
sel
方法,而不是
isel

谢谢!我注意到您的
factor.isel()
中没有包含
时间。这是故意的吗?我需要使用所有的时间点,但将来我可能只需要一个子集。时间点采用unix时间戳格式。如果要沿时间轴切片,则可以添加
time=slice(start,end)
,但对于当前用例,这将为您提供所有时间步。