Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/301.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何读取ecmwf文件上的日期和时间_Python_Datetime_Pandas_Time Series_Data Analysis - Fatal编程技术网

Python 如何读取ecmwf文件上的日期和时间

Python 如何读取ecmwf文件上的日期和时间,python,datetime,pandas,time-series,data-analysis,Python,Datetime,Pandas,Time Series,Data Analysis,我在netcdf文件中有全局数据集。数据文件上的时间信息为: <type 'netCDF4._netCDF4.Variable'> int32 time(time) units: hours since 1900-01-01 00:00:0.0 long_name: time calendar: gregorian unlimited dimensions: time current shape = (5875,) filling off 我的问题是如何将此

我在netcdf文件中有全局数据集。数据文件上的时间信息为:

<type 'netCDF4._netCDF4.Variable'>
int32 time(time)
    units: hours since 1900-01-01 00:00:0.0
    long_name: time
    calendar: gregorian
unlimited dimensions: time
current shape = (5875,)
filling off
我的问题是如何将此数组转换为正确的日期格式? [注意:这是一个每日数据集,数组中的数字对应于1900-01-01的小时数]

您可以:

from datetime import date, timedelta

hours = [ 876600,  876624,  876648, 1017528, 1017552, 1017576]
base = date(1900, 1, 1)
for hour in hours:
    base + timedelta(hours=hour)

2000-01-02
2000-01-03
2000-01-04
2016-01-30
2016-01-31
2016-02-01
如果需要
hour
etc信息,请使用
datetime
而不是
date

或者使用
pd.DataFrame

df = pd.DataFrame(hours, columns=['hours'])
df['date'] = df.hours.apply(lambda x: base + timedelta(hours=x))

     hours        date
0   876600  2000-01-02
1   876624  2000-01-03
2   876648  2000-01-04
3  1017528  2016-01-30
4  1017552  2016-01-31
5  1017576  2016-02-01

使用
.apply
的soln效率极低,更不用说不惯用和丑陋了。pandas已经有了进行时间增量转换的内置矢量化方法

In [17]: hours = [ 876600,  876624,  876648, 1017528, 1017552, 1017576]*10000

In [18]: df = pd.DataFrame(hours, columns=['hours'])

In [19]: %timeit df.hours.apply(lambda x: base + timedelta(hours=x))
10 loops, best of 3: 74.2 ms per loop

In [21]: %timeit pd.to_timedelta(df.hours, unit='h') + Timestamp(base)
100 loops, best of 3: 11.3 ms per loop

In [23]: (pd.to_timedelta(df.hours, unit='h') + Timestamp(base)).head()
Out[23]: 
0   2000-01-02
1   2000-01-03
2   2000-01-04
3   2016-01-30
4   2016-01-31
Name: hours, dtype: datetime64[ns]

实现这一点的理想方法是使用


你是说1900-01年的日期-01@KHELILI,我的意思是这个数组的对应日期谢谢你,你让我高兴了:)我尝试了你的代码,但我得到了错误“'numpy.ndarray'对象没有属性'units',现在我明白了,它可以工作:),我把[:]过了一段时间,所以它就不起作用了。完美:)@bikuser,没错-只有在变量读入后不包含
[:]
才能访问
单位
日历
属性。很高兴这有帮助!
In [17]: hours = [ 876600,  876624,  876648, 1017528, 1017552, 1017576]*10000

In [18]: df = pd.DataFrame(hours, columns=['hours'])

In [19]: %timeit df.hours.apply(lambda x: base + timedelta(hours=x))
10 loops, best of 3: 74.2 ms per loop

In [21]: %timeit pd.to_timedelta(df.hours, unit='h') + Timestamp(base)
100 loops, best of 3: 11.3 ms per loop

In [23]: (pd.to_timedelta(df.hours, unit='h') + Timestamp(base)).head()
Out[23]: 
0   2000-01-02
1   2000-01-03
2   2000-01-04
3   2016-01-30
4   2016-01-31
Name: hours, dtype: datetime64[ns]
import netCDF4

ncfile = netCDF4.Dataset('./foo.nc', 'r')
time = ncfile.variables['time']
dates = netCDF4.num2date(time[:], time.units, time.calendar)