Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/305.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 用NaN填充datetimeindex间隙_Python_Pandas_Reindex_Gaps In Data_Datetimeindex - Fatal编程技术网

Python 用NaN填充datetimeindex间隙

Python 用NaN填充datetimeindex间隙,python,pandas,reindex,gaps-in-data,datetimeindex,Python,Pandas,Reindex,Gaps In Data,Datetimeindex,我有两个数据帧,它们是datetimeindexed。其中一个缺少一些日期时间(df1),而另一个则是完整的(有规则的时间戳,在这个系列中没有任何间隔),并且充满了NaN(df2) 我正在尝试将df1中的值与df2的索引进行匹配,填充NaN,而datetimeindex在df1中不存在 例如: In [51]: df1 Out [51]: value 2015-01-01 14:00:00 20 2015-

我有两个数据帧,它们是datetimeindexed。其中一个缺少一些日期时间(
df1
),而另一个则是完整的(有规则的时间戳,在这个系列中没有任何间隔),并且充满了
NaN
df2

我正在尝试将df1中的值与
df2
的索引进行匹配,填充
NaN
,而
datetimeindex
df1
中不存在

例如:

In  [51]: df1
Out [51]:                       value
          2015-01-01 14:00:00   20
          2015-01-01 15:00:00   29
          2015-01-01 16:00:00   41
          2015-01-01 17:00:00   43
          2015-01-01 18:00:00   26
          2015-01-01 19:00:00   20
          2015-01-01 20:00:00   31
          2015-01-01 21:00:00   35
          2015-01-01 22:00:00   39
          2015-01-01 23:00:00   17
          2015-03-01 00:00:00   6
          2015-03-01 01:00:00   37
          2015-03-01 02:00:00   56
          2015-03-01 03:00:00   12
          2015-03-01 04:00:00   41
          2015-03-01 05:00:00   31
          ...   ...

          2018-12-25 23:00:00   41

          <34843 rows × 1 columns>

In  [52]: df2 = pd.DataFrame(data=None, index=pd.date_range(freq='60Min', start=df1.index.min(), end=df1.index.max()))
          df2['value']=np.NaN
          df2
Out [52]:                       value
          2015-01-01 14:00:00   NaN
          2015-01-01 15:00:00   NaN
          2015-01-01 16:00:00   NaN
          2015-01-01 17:00:00   NaN
          2015-01-01 18:00:00   NaN
          2015-01-01 19:00:00   NaN
          2015-01-01 20:00:00   NaN
          2015-01-01 21:00:00   NaN
          2015-01-01 22:00:00   NaN
          2015-01-01 23:00:00   NaN
          2015-01-02 00:00:00   NaN
          2015-01-02 01:00:00   NaN
          2015-01-02 02:00:00   NaN
          2015-01-02 03:00:00   NaN
          2015-01-02 04:00:00   NaN
          2015-01-02 05:00:00   NaN
          ...                   ...
          2018-12-25 23:00:00   NaN

          <34906 rows × 1 columns>
这就是我希望得到的:

Out [53]:                       value
          2015-01-01 14:00:00   20
          2015-01-01 15:00:00   29
          2015-01-01 16:00:00   41
          2015-01-01 17:00:00   43
          2015-01-01 18:00:00   26
          2015-01-01 19:00:00   20
          2015-01-01 20:00:00   31
          2015-01-01 21:00:00   35
          2015-01-01 22:00:00   39
          2015-01-01 23:00:00   17
          2015-01-02 00:00:00   NaN
          2015-01-02 01:00:00   NaN
          2015-01-02 02:00:00   NaN
          2015-01-02 03:00:00   NaN
          2015-01-02 04:00:00   NaN
          2015-01-02 05:00:00   NaN
          ...                   ...
          2018-12-25 23:00:00   41

          <34906 rows × 1 columns>
Out[53]:值
2015-01-01 14:00:00   20
2015-01-01 15:00:00   29
2015-01-01 16:00:00   41
2015-01-01 17:00:00   43
2015-01-01 18:00:00   26
2015-01-01 19:00:00   20
2015-01-01 20:00:00   31
2015-01-01 21:00:00   35
2015-01-01 22:00:00   39
2015-01-01 23:00:00   17
2015-01-02 00:00:00南
2015-01-02 01:00:00南
2015-01-02 02:00:00南
2015-01-02 03:00:00南
2015-01-02 04:00:00南
2015-01-02 05:00:00南
...                   ...
2018-12-25 23:00:00   41
有人能解释一下为什么会发生这种情况,以及如何设置这些值的填充方式吗?

IIUC您需要
df1
,因为您有一个不规则的
频率,您需要规则的频率:

print df1.index.freq
None

print Result.index.freq
<60 * Minutes>

谢谢@jezrael的建议,我已经尝试了你的方法,但是使用
asfreq
重采样
仍然存在同样的问题。为使系列规则化而填充的间隙包含不应该存在的数据。索引中还有其他漏洞,可能会产生一些影响。如果有帮助的话,我正在使用pandas版本0.14.1和Python2.7.10I添加测试数据,仍然存在相同的问题吗?如果是,它可以是您的版本0.14.1-我使用0.17.1,它运行良好。我已将pandas更新为0.17.1,并使用您的测试数据获得与您相同的结果。“使用我的数据”仍然使用数据填充新的datetimeindexes。是一个指向存储我的数据的csv的链接-也许你会遇到与此相同的问题?谢谢@jezrael。你已经清楚地告诉我,
resample
是有效的。这让我更仔细地观察了我的数据集,发现熊猫对日期的解释与我的预期不同。数据的单位是日/月/年小时:分钟,而熊猫的单位是月/日/年小时:分钟。我已经在
pd.read\u csv
中包含了一个
dayfirst=True
,它的工作方式正是我现在所期望的。谢谢你的帮助!
print df1.index.freq
None

print Result.index.freq
<60 * Minutes>
import pandas as pd

df1 = pd.read_csv('test/GapInTimestamps.csv', sep=",", index_col=[0], parse_dates=[0])
print df1.head()

#                     value
#Date/Time                 
#2015-01-01 00:00:00     52
#2015-01-01 01:00:00      5
#2015-01-01 02:00:00     12
#2015-01-01 03:00:00     54
#2015-01-01 04:00:00     47
print df1.info()

#<class 'pandas.core.frame.DataFrame'>
#DatetimeIndex: 34857 entries, 2015-01-01 00:00:00 to 2018-12-25 23:00:00
#Data columns (total 1 columns):
#value    34857 non-null int64
#dtypes: int64(1)
#memory usage: 544.6 KB
#None

Result  = df1.resample('60min')
print Result.head()

#                     value
#Date/Time                 
#2015-01-01 00:00:00     52
#2015-01-01 01:00:00      5
#2015-01-01 02:00:00     12
#2015-01-01 03:00:00     54
#2015-01-01 04:00:00     47
print Result.info()

#<class 'pandas.core.frame.DataFrame'>
#DatetimeIndex: 34920 entries, 2015-01-01 00:00:00 to 2018-12-25 23:00:00
#Freq: 60T
#Data columns (total 1 columns):
#value    34857 non-null float64
#dtypes: float64(1)
#memory usage: 545.6 KB
#None

#find values with NaN
resultnan =  Result[Result.isnull().any(axis=1)]
#temporaly display 999 rows and 15 columns
with pd.option_context('display.max_rows', 999, 'display.max_columns', 15):
    print resultnan

#                     value
#Date/Time                 
#2015-01-13 19:00:00    NaN
#2015-01-13 20:00:00    NaN
#2015-01-13 21:00:00    NaN
#2015-01-13 22:00:00    NaN
#2015-01-13 23:00:00    NaN
#2015-01-14 00:00:00    NaN
#2015-01-14 01:00:00    NaN
#2015-01-14 02:00:00    NaN
#2015-01-14 03:00:00    NaN
#2015-01-14 04:00:00    NaN
#2015-01-14 05:00:00    NaN
#2015-01-14 06:00:00    NaN
#2015-01-14 07:00:00    NaN
#2015-01-14 08:00:00    NaN
#2015-01-14 09:00:00    NaN
#2015-02-01 00:00:00    NaN
#2015-02-01 01:00:00    NaN
#2015-02-01 02:00:00    NaN
#2015-02-01 03:00:00    NaN
#2015-02-01 04:00:00    NaN
#2015-02-01 05:00:00    NaN
#2015-02-01 06:00:00    NaN
#2015-02-01 07:00:00    NaN
#2015-02-01 08:00:00    NaN
#2015-02-01 09:00:00    NaN
#2015-02-01 10:00:00    NaN
#2015-02-01 11:00:00    NaN
#2015-02-01 12:00:00    NaN
#2015-02-01 13:00:00    NaN
#2015-02-01 14:00:00    NaN
#2015-02-01 15:00:00    NaN
#2015-02-01 16:00:00    NaN
#2015-02-01 17:00:00    NaN
#2015-02-01 18:00:00    NaN
#2015-02-01 19:00:00    NaN
#2015-02-01 20:00:00    NaN
#2015-02-01 21:00:00    NaN
#2015-02-01 22:00:00    NaN
#2015-02-01 23:00:00    NaN
#2015-11-01 00:00:00    NaN
#2015-11-01 01:00:00    NaN
#2015-11-01 02:00:00    NaN
#2015-11-01 03:00:00    NaN
#2015-11-01 04:00:00    NaN
#2015-11-01 05:00:00    NaN
#2015-11-01 06:00:00    NaN
#2015-11-01 07:00:00    NaN
#2015-11-01 08:00:00    NaN
#2015-11-01 09:00:00    NaN
#2015-11-01 10:00:00    NaN
#2015-11-01 11:00:00    NaN
#2015-11-01 12:00:00    NaN
#2015-11-01 13:00:00    NaN
#2015-11-01 14:00:00    NaN
#2015-11-01 15:00:00    NaN
#2015-11-01 16:00:00    NaN
#2015-11-01 17:00:00    NaN
#2015-11-01 18:00:00    NaN
#2015-11-01 19:00:00    NaN
#2015-11-01 20:00:00    NaN
#2015-11-01 21:00:00    NaN
#2015-11-01 22:00:00    NaN
#2015-11-01 23:00:00    NaN