Python 趋势时间序列数据帧
我有一个看起来像这样的数据框:Python 趋势时间序列数据帧,python,pandas,Python,Pandas,我有一个看起来像这样的数据框: d={'business':['FX','FX','IR','IR'],\ 'date':(['01/01/2018','05/01/2018','01/01/2018','05/01/2018']),\ 'amt':[1,5,101,105]} df=pd.DataFrame(data=d) df['date'] = pd.to_datetime(df['date'],format='%d/%m/%Y') df 是否有一个函数可以扩展上面的数据框,使其看起来像
d={'business':['FX','FX','IR','IR'],\
'date':(['01/01/2018','05/01/2018','01/01/2018','05/01/2018']),\
'amt':[1,5,101,105]}
df=pd.DataFrame(data=d)
df['date'] = pd.to_datetime(df['date'],format='%d/%m/%Y')
df
是否有一个函数可以扩展上面的数据框,使其看起来像:
d_out={'business':['FX','FX','FX','FX','FX','IR','IR','IR','IR','IR'],\
'date':(['01/01/2018','02/01/2018','03/01/2018','04/01/2018','05/01/2018',\
'01/01/2018','02/01/2018','03/01/2018','04/01/2018','05/01/2018']),\
'amt':[1,2,3,4,5,101,102,103,104,105]}
d_out=pd.DataFrame(data=d_out)
d_out
我试图根据两个日期之间的天数插入行,并根据某种简单的平均值填充amt字段
只是检查一下,看看最有效的阅读简单的方式做上述
谢谢,
agg
将df返回到列表
模式,然后查看
您需要注意以下几件事:
import pandas as pd
import numpy as np
d={'business':['FX','FX','IR','IR'],\
'date':(['01/01/2018','05/01/2018','01/01/2018','05/01/2018']),\
'amt':[1,5,101,105]}
df=pd.DataFrame(data=d)
df['date'] = pd.to_datetime(df['date'],format='%d/%m/%Y')
df_array = []
result_df = df
orig_row=0
new_row=0
for i in range(len(df)):
df_array.append(df.values[orig_row])
if orig_row <len(df)-1:
if ((df.date[orig_row+1] - df.date[orig_row]).days > 1):
amt_avg = (df.amt[orig_row]+df.amt[orig_row+1])/2
for i in range(((df.date[orig_row+1] - df.date[orig_row]).days)-1):
df_array.append([df.business[orig_row],df.date[orig_row]+timedelta(days=i+1), amt_avg])
orig_row+=1
result_df = pd.DataFrame(df_array,columns=['business','date','amt'])
我认为最好使用
日期
列作为时间索引,使用外汇/外汇业务的金额
作为两列(例如称为IR\u amt和FX\u amt)然后,您可以在数据帧上使用
.interpolate
,并立即获得解决方案。未定义其他功能等。
代码示例:
import numpy as np
for business in set(df['business'].values):
df['{}_amt'.format(business)] = df.apply(lambda row: row['amt'] if row['business']==business else np.nan, axis=1)
df = df.drop(['business','amt'],axis=1).groupby('date').mean()
df = df.resample('1D').interpolate()
当我通过时,我得到了一个NaT:d={'business':['FX','FX','IR','IR','date':(['01/01/2018','05/01/2018','05/01/2018']),'amt':[1,5101110]}df=pd.DataFrame(data=d)df['date']=pd.to\u datetime(df['date'],格式='%d/%m/%Y')df@NumberLogic用你的样本数据试试我的代码,如果行得通的话,然后,您可以检查示例数据和实际数据之间的差异。如果您要获取“业务”列的平均值,则需要进行分类编码(仅当数据集中的类别太多时),否则,此答案应为其他技巧。当然,此编辑中添加了此选项。在你这边运行,看看是否有效。我想第一个循环是可以避免的,但这仍然比其他方法更干净,通过利用
重采样
(到一天)和插值
。我通过使用多索引避免了第一个循环。谢谢你为我节省了大量的时间和一个优雅的解决方案!
import pandas as pd
import numpy as np
d={'business':['FX','FX','IR','IR'],\
'date':(['01/01/2018','05/01/2018','01/01/2018','05/01/2018']),\
'amt':[1,5,101,105]}
df=pd.DataFrame(data=d)
df['date'] = pd.to_datetime(df['date'],format='%d/%m/%Y')
df_array = []
result_df = df
orig_row=0
new_row=0
for i in range(len(df)):
df_array.append(df.values[orig_row])
if orig_row <len(df)-1:
if ((df.date[orig_row+1] - df.date[orig_row]).days > 1):
amt_avg = (df.amt[orig_row]+df.amt[orig_row+1])/2
for i in range(((df.date[orig_row+1] - df.date[orig_row]).days)-1):
df_array.append([df.business[orig_row],df.date[orig_row]+timedelta(days=i+1), amt_avg])
orig_row+=1
result_df = pd.DataFrame(df_array,columns=['business','date','amt'])
business date amt
0 FX 2018-01-01 1.0
1 FX 2018-01-02 3.0
2 FX 2018-01-03 3.0
3 FX 2018-01-04 3.0
4 FX 2018-01-05 5.0
5 IR 2018-01-01 101.0
6 IR 2018-01-02 103.0
7 IR 2018-01-03 103.0
8 IR 2018-01-04 103.0
9 IR 2018-01-05 105.0
import numpy as np
for business in set(df['business'].values):
df['{}_amt'.format(business)] = df.apply(lambda row: row['amt'] if row['business']==business else np.nan, axis=1)
df = df.drop(['business','amt'],axis=1).groupby('date').mean()
df = df.resample('1D').interpolate()