Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/325.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 重新采样后合并数据帧_Python_Pandas_Datetime_Dataframe_Merge - Fatal编程技术网

Python 重新采样后合并数据帧

Python 重新采样后合并数据帧,python,pandas,datetime,dataframe,merge,Python,Pandas,Datetime,Dataframe,Merge,我有一个带有日期时间索引的DataFrame df1=pd.DataFrame(index=pd.date_range('20100201', periods=24, freq='8h3min'), data=np.random.rand(24),columns=['Rubbish']) df1.index=df1.index.to_datetime() 我想重新采样此数据帧,如中所示: df1=df1.resample('7D').agg(np.median)

我有一个带有日期时间索引的DataFrame

df1=pd.DataFrame(index=pd.date_range('20100201', periods=24, freq='8h3min'),
                data=np.random.rand(24),columns=['Rubbish'])
df1.index=df1.index.to_datetime()
我想重新采样此数据帧,如中所示:

df1=df1.resample('7D').agg(np.median)
然后我有另一个数据帧,具有不同频率的索引,从不同的偏移小时开始

df2=pd.DataFrame(index=pd.date_range('20100205', periods=24, freq='6h3min'),
                data=np.random.rand(24),columns=['Rubbish'])
df2.index=df2.index.to_datetime()
df2=df2.resample('7D').agg(np.median)
这些操作独立运行得很好,但是当我尝试使用

print(pd.merge(df1,df2,right_index=True,left_index=True,how='outer'))
我得到:

Rubbish_x  Rubbish_y
2010-02-01   0.585986        NaN
2010-02-05        NaN   0.423316
2010-02-08   0.767499        NaN
虽然我想用相同的偏移量对两者进行重采样,并在合并后得到以下结果

            Rubbish_x  Rubbish_y
2010-02-01   AVALUE        AVALUE
2010-02-08   AVALUE        AVALUE
我尝试了以下方法,但它只会生成NaN

df2.reindex(df1.index)

print(pd.merge(df1,df2,right_index=True,left_index=True,how='outer'))
我必须坚持0.20.1

我已经试过了

但它会随着跟踪而崩溃

Traceback (most recent call last):


TypeError: 'NoneType' object is not callable
我认为需要:

或参数方法class='nearest'到:


我认为遵循代码库可以完成您的任务

>>> index = pd.date_range('1/1/2000', periods=9, freq='T')
>>> series = pd.Series(range(9), index=index)
>>> series
2000-01-01 00:00:00    0
2000-01-01 00:01:00    1
2000-01-01 00:02:00    2
2000-01-01 00:03:00    3
2000-01-01 00:04:00    4
2000-01-01 00:05:00    5
2000-01-01 00:06:00    6
2000-01-01 00:07:00    7
2000-01-01 00:08:00    8
Freq: T, dtype: int64

>>> series.resample('3T').sum()
2000-01-01 00:00:00     3
2000-01-01 00:03:00    12
2000-01-01 00:06:00    21
Freq: 3T, dtype: int64

很抱歉,我看不出这个问题的答案,我已经编辑了答案,我有问题mergeasof@00__00__00-hmmm,如果有相同的索引值,则合并就足够了,不需要合并_asof@00__00__00超级:事实上,它仍然会带来问题。只要df2的第一个索引在df1的第一个索引之后,它就可以正常工作。如果不是这样的话,我仍然会得到不一致的索引,就像在问题00_uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu。那么什么是规则,可以指定它们吗?我有一个想法——将方法class='nearest'更改为方法class='ffill'或方法class='bfill'。
print(pd.merge_asof(df1,df2,right_index=True,left_index=True))
            Rubbish_x  Rubbish_y
2010-02-01   0.446505        NaN
2010-02-08   0.474330   0.606826
df2 = df2.reindex(df1.index, method='nearest')
print (df2)
             Rubbish
2010-02-01  0.415248
2010-02-08  0.415248

print(pd.merge(df1,df2,right_index=True,left_index=True,how='outer'))
            Rubbish_x  Rubbish_y
2010-02-01   0.431966   0.415248
2010-02-08   0.279121   0.415248
>>> index = pd.date_range('1/1/2000', periods=9, freq='T')
>>> series = pd.Series(range(9), index=index)
>>> series
2000-01-01 00:00:00    0
2000-01-01 00:01:00    1
2000-01-01 00:02:00    2
2000-01-01 00:03:00    3
2000-01-01 00:04:00    4
2000-01-01 00:05:00    5
2000-01-01 00:06:00    6
2000-01-01 00:07:00    7
2000-01-01 00:08:00    8
Freq: T, dtype: int64

>>> series.resample('3T').sum()
2000-01-01 00:00:00     3
2000-01-01 00:03:00    12
2000-01-01 00:06:00    21
Freq: 3T, dtype: int64