Python 基于另一个数据帧对数据帧应用插值
我有一个Python 基于另一个数据帧对数据帧应用插值,python,pandas,Python,Pandas,我有一个DataFrame,我想根据一个特定列的值添加新列,其结果取决于另一个DataFrame中包含的数据 更具体地说,我有 df_original = Crncy Spread Duration 0 EUR 100 1.2 1 nan nan nan 2 100 3.46 3 CHF 200 2.5 4 USD 50 5.0 ... df_interpolation =
DataFrame
,我想根据一个特定列的值添加新列,其结果取决于另一个DataFrame
中包含的数据
更具体地说,我有
df_original =
Crncy Spread Duration
0 EUR 100 1.2
1 nan nan nan
2 100 3.46
3 CHF 200 2.5
4 USD 50 5.0
...
df_interpolation =
CRNCY TENOR Adj_EUR Adj_USD
0 EUR 1 10 20
1 EUR 2 20 30
2 EUR 5 30 40
3 EUR 7 40 50
...
10 CHF 1 50 10
11 CHF 2 60 20
12 CHF 5 70 30
...
现在,我想使用标准线性插值,基于Crncy
和Duration
的值,为每一行添加列Adj_EUR
和Adj_USD
到df_original
因此,我们希望使用基期
和调整美元
/调整欧元
从基期插值
和基期
从基期
,对于每个可用的基期
,形成插值
例如,使用optimize
-来自scipy
的包的伪代码:
from scipy import optimize
""" Do this for both 'Adj_EUR' and 'Adj_USD' """
# For 'Adj_EUR'
for curr, df in df_original.groupby('Crncy'):
x_data = df_interpolation[df_interpolation['CRNCY']==curr].as_matrix(['TENOR'])
y_data = df_interpolation[df_interpolation['CRNCY']==curr].as_matrix(['Adj_EUR'])
""" Linear fit """
z_linear = optimize.curve_fit(lambda t,a,b: a + b * t, x_data.ravel(), y_data.ravel())[0]
""" Somehow add the values back to df_original in a new column """
df['Adj_EUR'] = z_linear[0] + z_linear[1] * df['Duration']
屈服
Crncy Spread Duration Adj_EUR Adj_USD
0 EUR 100 1.2 12 22
1 nan nan nan 0.0 0.0
...
你知道怎么做吗
非常可观假设我们有
df1
和df2
>>> df1
Crncy Spread Duration
0 EUR 100 1.2
1 CHF 200 2.5
>>> df2
CRNCY TENOR Adj_EUR Adj_USD
0 EUR 1 10 20
1 EUR 2 20 30
2 EUR 5 30 40
3 EUR 7 40 50
4 CHF 1 50 10
5 CHF 2 60 20
6 CHF 5 70 30
将df1
和df2
转换为类似的数据帧
df1['Adj_EUR'] = np.nan
df1['Adj_USD'] = np.nan
df1['left'] = 1
>>> df1
Crncy Spread Duration Adj_EUR Adj_USD left
0 EUR 100 1.2 NaN NaN 1
1 CHF 200 2.5 NaN NaN 1
df2 = df2.rename(columns={'CRNCY': 'Crncy', 'TENOR': 'Duration'})
df2['Spread'] = np.nan
df2['left'] = 0
>>> df2
Crncy Duration Adj_EUR Adj_USD Spread left
0 EUR 1 10 20 NaN 0
1 EUR 2 20 30 NaN 0
2 EUR 5 30 40 NaN 0
3 EUR 7 40 50 NaN 0
4 CHF 1 50 10 NaN 0
5 CHF 2 60 20 NaN 0
6 CHF 5 70 30 NaN 0
现在concatdf1
和df2
行方向
df3 = pd.concat([df1, df2], ignore_index=True, sort=False).sort_values(['Crncy', 'Duration'])
>>> df3
Crncy Spread Duration Adj_EUR Adj_USD left
6 CHF NaN 1.0 50.0 10.0 0
7 CHF NaN 2.0 60.0 20.0 0
1 CHF 200.0 2.5 NaN NaN 1
8 CHF NaN 5.0 70.0 30.0 0
2 EUR NaN 1.0 10.0 20.0 0
0 EUR 100.0 1.2 NaN NaN 1
3 EUR NaN 2.0 20.0 30.0 0
4 EUR NaN 5.0 30.0 40.0 0
5 EUR NaN 7.0 40.0 50.0 0
然后使用Duration
插入每列的NaN
值,然后删除不必要的列:
df3 = df3.set_index('Duration')
df4 = df3.groupby(['Crncy']).apply(lambda x: x.interpolate(method='index')).reset_index()
df4 = df4[['Crncy', 'Spread', 'Duration', 'Adj_EUR', 'Adj_USD', 'left']]
df4 = df4.loc[df4['left'] == 1].drop('left', axis=1).reset_index(drop=True)
>>> df4
Crncy Spread Duration Adj_EUR Adj_USD
0 CHF 200.0 2.5 61.666667 21.666667
1 EUR 100.0 1.2 12.000000 22.000000
希望这能有所帮助。所以,这更是我想要的:
from scipy import optimize
for curr, df in df_original.groupby('Crncy'):
x_data = df_interpolation[df_interpolation['CRNCY']==curr].as_matrix(['TENOR'])
y_data_usd = df_interpolation[df_interpolation['CRNCY']==curr].as_matrix(['Adj_USD'])
y_data_eur = df_interpolation[df_interpolation['CRNCY']==curr].as_matrix(['Adj_EUR'])
""" Linear fit """
if x_data.size > 0:
z_linear_usd = optimize.curve_fit(lambda t,a,b: a + b * t, x_data.ravel(), y_data_usd.ravel())[0]
z_linear_eur = optimize.curve_fit(lambda t,a,b: a + b * t, x_data.ravel(), y_data_eur.ravel())[0]
temp_df = df.copy()[['Crncy','Duration']]
temp_df['Adj_USD'] = z_linear_usd[0] + z_linear_usd[1] * temp_df['OAD']
temp_df['Adj_EUR'] = z_linear_eur[0] + z_linear_eur[1] * temp_df['OAD']
temp_interpolation_lst.append(temp_df)
del temp_df
temp_interpolation_df = pd.concat(temp_interpolation_lst)
temp_interpolation_df.sort_index(axis=0, inplace=True)
""" Add back to original DataFrame - as the indices are the same and matching..."""
df_original = df_original .join(other=temp_interpolation_df[['Adj_USD', 'Adj_EUR']], how='left')
它不像我希望的那样干净,但似乎仍然有效……您能提供更多关于如何计算输出值的信息吗?目前还不清楚。@asongtoruin更清楚吗?谢谢