Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/306.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 用5分钟范围内最接近的行值填充缺少的值_Python_Pandas_Dataframe - Fatal编程技术网

Python 用5分钟范围内最接近的行值填充缺少的值

Python 用5分钟范围内最接近的行值填充缺少的值,python,pandas,dataframe,Python,Pandas,Dataframe,使用此代码: import numpy as np import pandas as pd df = pd.read_csv('wind.txt', header=0, delim_whitespace= True, index_col = True) df2 = pd.DataFrame({'Date':pd.date_range(start='2016-07-12 18:00:00',end='2017-01-01 00:00:00',freq='3H')}) df3 = pd.mer

使用此代码:

import numpy as np 
import pandas as pd
df = pd.read_csv('wind.txt', header=0, delim_whitespace= True, index_col = True)
df2 = pd.DataFrame({'Date':pd.date_range(start='2016-07-12 18:00:00',end='2017-01-01 00:00:00',freq='3H')})
df3 = pd.merge_asof(df2,df1, on='Date', tolerance=pd.Timedelta("5 minutes")).fillna('NaN')
数据帧是这样的:

Date               Vel Dir
2016-07-12 16:15:00 2.8  1.8
2016-07-12 16:16:00 3.9  21.8
2016-07-12 16:17:00 9.8  4.8
2016-07-12 16:18:00 16.9 5.8
2016-07-12 16:19:00 17.0 7.1
2016-07-12 16:20:00 NaN  NaN
2016-07-12 16:21:00 2.8  1.8
2016-07-12 16:22:00 3.9  21.8
...                 ...  ...
...                 ...  ...
2017-01-01 00:00:00 21.2  19.7
有时,数据帧会丢失大量数据,如下所示:

Date               Vel   Dir
2016-07-12 17:56:00 2.8  1.8
2016-07-12 17:57:00 NaN  NaN
2016-07-12 17:58:00 9.8  4.8
2016-07-12 17:59:00 NaN  NaN
2016-07-12 18:00:00 NaN  NaN
2016-07-12 18:01:00 NaN  NaN
2016-07-12 18:02:00 2.8  1.8
2016-07-12 18:03:00 NaN  NaN
...                 ...  ...
...                 ...  ...
2017-01-01 00:00:00 21.2  19.7
第一个目标是创建一个新的数据帧,而不是1分钟的时间使用3小时的时间。使用此代码:

import numpy as np 
import pandas as pd
df = pd.read_csv('wind.txt', header=0, delim_whitespace= True, index_col = True)
df2 = pd.DataFrame({'Date':pd.date_range(start='2016-07-12 18:00:00',end='2017-01-01 00:00:00',freq='3H')})
df3 = pd.merge_asof(df2,df1, on='Date', tolerance=pd.Timedelta("5 minutes")).fillna('NaN')
在这里一切正常之前,这会生成一个数据帧,没有像spected那样的Vel和Dir,如下所示:

Date               
2016-07-12 18:00:00
2016-07-12 21:00:00
2016-07-13 00:00:00
2016-07-13 03:00:00
...        ...
...        ...
2017-01-01 00:00:00
Date               Vel   Dir
2016-07-12 17:56:00 2.8  1.8
2016-07-12 17:57:00 NaN  NaN
2016-07-12 17:58:00 9.8  4.8
2016-07-12 17:59:00 NaN  NaN
2016-07-12 18:00:00 NaN  NaN
2016-07-12 18:01:00 NaN  NaN
2016-07-12 18:02:00 2.8  1.8
2016-07-12 18:03:00 NaN  NaN
...                 ...  ...
...                 ...  ...
2017-01-01 00:00:00 21.2  19.7
现在的目标是根据
日期
用df1的Vel和Dir值填充df2,但缺少一些数据。知道了这一点,我尝试在以下代码中合并:

import numpy as np 
import pandas as pd
df = pd.read_csv('wind.txt', header=0, delim_whitespace= True, index_col = True)
df2 = pd.DataFrame({'Date':pd.date_range(start='2016-07-12 18:00:00',end='2017-01-01 00:00:00',freq='3H')})
df3 = pd.merge_asof(df2,df1, on='Date', tolerance=pd.Timedelta("5 minutes")).fillna('NaN')
它起作用了,但它只使用前面的第一行填充缺少的数据。目标是使用前后行中的值来填充缺少的数据。诸如此类:

Date               
2016-07-12 18:00:00
2016-07-12 21:00:00
2016-07-13 00:00:00
2016-07-13 03:00:00
...        ...
...        ...
2017-01-01 00:00:00
Date               Vel   Dir
2016-07-12 17:56:00 2.8  1.8
2016-07-12 17:57:00 NaN  NaN
2016-07-12 17:58:00 9.8  4.8
2016-07-12 17:59:00 NaN  NaN
2016-07-12 18:00:00 NaN  NaN
2016-07-12 18:01:00 NaN  NaN
2016-07-12 18:02:00 2.8  1.8
2016-07-12 18:03:00 NaN  NaN
...                 ...  ...
...                 ...  ...
2017-01-01 00:00:00 21.2  19.7
预期产出:

2016-07-12 18:00:00 9.8  4.8
但如果数据帧类似于:

Date               Vel   Dir
2016-07-12 17:56:00 NaN  NaN
2016-07-12 17:57:00 NaN  NaN
2016-07-12 17:58:00 NaN  NaN
2016-07-12 17:59:00 NaN  NaN
2016-07-12 18:00:00 NaN  NaN
2016-07-12 18:01:00 NaN  NaN
2016-07-12 18:02:00 2.8  1.8
2016-07-12 18:03:00 NaN  NaN
...                 ...  ...
...                 ...  ...
2017-01-01 00:00:00 21.2  19.7
预期产出:

2016-07-12 18:00:00 2.8  1.8

目标是对所有数据帧执行此操作,如果在Vel和Dir之前或之后的5分钟内不存在任何值,则必须是
NaN
。如果有人能帮上忙,那会很有帮助。

让我们使用Pandas版本0.20.1和
pd。将
与参数
direction='nearest'
合并:

df3 = pd.merge_asof(df2,df1, on='Date', tolerance=pd.Timedelta("5 minutes"), direction='nearest').fillna('NaN')

在pandas版本20.1中,有一个新选项,参数为
direction
,“最近的”会有帮助吗?实际上它起作用了:-)非常感谢。已向上投票=D