在python中读取和操作多个netcdf文件_Python_Numpy_Matplotlib_Netcdf

在python中读取和操作多个netcdf文件

python numpy matplotlib

在python中读取和操作多个netcdf文件,python,numpy,matplotlib,netcdf,Python,Numpy,Matplotlib,Netcdf,我需要阅读多个netCDF文件的帮助，尽管这里的示例很少，但它们都不能正常工作。我使用的是Python（x，y）版本2.7.5和其他软件包：netcdf4 1.0.7-4、matplotlib 1.3.1-4、numpy 1.8、pandas 0.12、，底图1.0.2 我已经习惯了用Python来处理毕业生的一些事情。我有一些2米的温度数据（每年4小时的数据，来自ECMWF），每个文件包含2米的温度数据，其中Xsize=480，Ysize=241，对于闰年，Zsize（等级）=1，Ts

我需要阅读多个netCDF文件的帮助，尽管这里的示例很少，但它们都不能正常工作。我使用的是Python（x，y）版本2.7.5和其他软件包：netcdf4 1.0.7-4、matplotlib 1.3.1-4、numpy 1.8、pandas 0.12、，底图1.0.2

我已经习惯了用Python来处理毕业生的一些事情。我有一些2米的温度数据（每年4小时的数据，来自ECMWF），每个文件包含2米的温度数据，其中Xsize=480，Ysize=241，对于闰年，Zsize（等级）=1，Tsize（时间）=1460或1464。这些是我的文件名，看起来很像：t2m.1981.nc、t2m.1982.nc、t2m.1983.nc……等等

基于此页面： ( ) 这就是我现在的处境：

from pylab import *
import netCDF4 as nc
from netCDF4 import *
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
import numpy as np

f = nc.MFDataset('d:/data/ecmwf/t2m.????.nc') # as '????' being the years
t2mtr = f.variables['t2m']

ntimes, ny, nx = shape(t2mtr)
temp2m = zeros((ny,nx),dtype=float64)
print ntimes
for i in xrange(ntimes):
    temp2m += t2mtr[i,:,:] #I'm not sure how to slice this, just wanted to get the 00Z values.
      # is it possible to assign to a new array,...
      #... (for eg.) the average values of  00z for January only from 1981-2000? 

#creating a NetCDF file
nco = nc.Dataset('d:/data/ecmwf/t2m.00zJan.nc','w',clobber=True)
nco.createDimension('x',nx)
nco.createDimension('y',ny)

temp2m_v = nco.createVariable('t2m', 'i4',  ( 'y', 'x'))
temp2m_v.units='Kelvin'
temp2m_v.long_name='2 meter Temperature'
temp2m_v.grid_mapping = 'Lambert_Conformal' # can it be something else or ..
#... eliminated?).This is straight from the solution on that webpage.

lono = nco.createVariable('longitude','f8')
lato = nco.createVariable('latitude','f8')
xo = nco.createVariable('x','f4',('x')) #not sure if this is important
yo = nco.createVariable('y','f4',('y')) #not sure if this is important
lco = nco.createVariable('Lambert_Conformal','i4') #not sure

#copy all the variable attributes from original file
for var in ['longitude','latitude']:
    for att in f.variables[var].ncattrs():
        setattr(nco.variables[var],att,getattr(f.variables[var],att))

# copy variable data for lon,lat,x and y
lono=f.variables['longitude'][:]
lato=f.variables['latitude'][:]
#xo[:]=f.variables['x']
#yo[:]=f.variables['y']

#  write the temp at 2 m data
temp2m_v[:,:]=temp2m

# copy Global attributes from original file
for att in f.ncattrs():
    setattr(nco,att,getattr(f,att))

nco.Conventions='CF-1.6' #not sure what is this.
nco.close()

#attempt to plot the 00zJan mean
file=nc.Dataset('d:/data/ecmwf/t2m.00zJan.nc','r')
t2mtr=file.variables['t2m'][:]
lon=file.variables['longitude'][:]
lat=file.variables['latitude'][:]
clevs=np.arange(0,500.,10.)
map =   Basemap(projection='cyl',llcrnrlat=0.,urcrnrlat=10.,llcrnrlon=97.,urcrnrlon=110.,resolution='i')
x,y=map(*np.meshgrid(lon,lat))
cs = map.contourf(x,y,t2mtr,clevs,extend='both')
map.drawcoastlines()
map.drawcountries()
plt.plot(cs)
plt.show()

第一个问题是在

temp2m+=t2mtr[1，：，：]

。我不知道如何对数据进行切片，以便只获得所有文件的00z（比如说仅一月）

其次，在运行测试时，在

cs=map.tourtf（x，y，t2mtr，clevs，extend='both'）

处出现错误，表示“形状与z的形状不匹配：找到（1,1）而不是（241480）”。我知道输出数据上可能有一些错误，这是由于记录值时出错造成的，但我不知道是什么/在哪里

谢谢你抽出时间。我希望这不会令人困惑

因此

t2mtr

是一个3d阵列

ntimes, ny, nx = shape(t2mtr)

这将对第一个轴上的所有值求和：

for i in xrange(ntimes):
    temp2m += t2mtr[i,:,:]

更好的方法是：

temp2m = np.sum(tm2tr, axis=0)
temp2m = tm2tr.sum(axis=0) # alt

如果您想要平均值，请使用

np.mean

而不是

np.sum

要在时间子集上求平均值，

jan_times

，请使用以下表达式：

jan_avg = np.mean(tm2tr[jan_times,:,:], axis=0)

如果您只想要一个简单的范围，例如前30次，这是最简单的。为了简单起见，我假设数据是每日的，年份是固定长度的。您可以根据4小时频率和闰年进行调整

tm2tr[0:31,:,:]

tm2tr.reshape(nyrs, 365, nx, ny)[:,0:31,:,:].mean(axis=1)

获取几年一月数据的一种简单方法是构建如下指数：

yr_starts = np.arange(0,3)*365 # can adjust for leap years
jan_times = (yr_starts[:,None]+ np.arange(31)).flatten()
# array([  0,   1,   2, ...  29,  30, 365, ..., 756, 757, 758, 759, 760])

另一个选择是重塑tm2tr（在闰年不起作用）

您可以使用以下方法测试时间采样：

np.arange(5*365).reshape(5,365)[:,0:31].mean(axis=1)

数据集没有时间变量吗？您可以从中提取所需的时间索引。几年前，我使用ECMWF数据，但不记得很多细节

至于你的

contourf

错误，我会检查三个主要参数的形状：

，

t2mtr

。他们应该匹配。我没有使用过

Basemap

你所说的00Z数据是什么意思？我不明白你们的数据集是什么格式的：3D，形状=（时间，x，y）我所理解的。Z在哪里/是什么？每个文件包含00z、06z、12z和18z（时间，UTC）。这是每日数据的4倍。所以，假设在一个文件t2m.2000.nc中，t=1464，一年内每天4次。由于数据位于表面（2米温度），因此Z值=1。这是一个全球网格数据。感谢@hpaulj的解释和解决方案。正是我要找的。是的，数据可以设置为特定的时间索引，但我下载的方式是为了节省时间（和一点空间）。再次感谢。嗨@hpaulj，我想知道

yr\u starts=np.arange（0,3）*365

，

（0,3）

是指时间步（在本例中是指每天4小时）还是指一些随机的年数。谢谢

np.arange（0,3）*365

只是

[0,365,2*365]

，我假设它是连续1月1日数据点的索引。它需要一个额外的

*6

用于4hr数据，以及其他起点和闰年的偏移量。我还需要检查

nan

（或同等数据）的数据。当我使用ECMWF时，我需要温度等点数据和降水等周期数据。一组在请求的时间段结束时有一个

nan

，另一组在开始时有一个。谢谢@hpaulj。。我刚刚意识到我也可以在python控制台上检查

numpy.arange

。）