Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/329.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 基于另一个数据帧对数据帧应用插值_Python_Pandas - Fatal编程技术网

Python 基于另一个数据帧对数据帧应用插值

Python 基于另一个数据帧对数据帧应用插值,python,pandas,Python,Pandas,我有一个DataFrame,我想根据一个特定列的值添加新列,其结果取决于另一个DataFrame中包含的数据 更具体地说,我有 df_original = Crncy Spread Duration 0 EUR 100 1.2 1 nan nan nan 2 100 3.46 3 CHF 200 2.5 4 USD 50 5.0 ... df_interpolation =

我有一个
DataFrame
,我想根据一个特定列的值添加新列,其结果取决于另一个
DataFrame
中包含的数据

更具体地说,我有

df_original = 

    Crncy  Spread  Duration
0   EUR    100     1.2
1   nan    nan     nan
2          100     3.46
3   CHF    200     2.5
4   USD    50      5.0
...

df_interpolation = 

    CRNCY  TENOR   Adj_EUR   Adj_USD
0   EUR    1       10        20    
1   EUR    2       20        30  
2   EUR    5       30        40  
3   EUR    7       40        50  
...
10  CHF    1       50        10  
11  CHF    2       60        20  
12  CHF    5       70        30  
...
现在,我想使用标准线性插值,基于
Crncy
Duration
的值,为每一行添加列
Adj_EUR
Adj_USD
df_original

因此,我们希望使用
基期
调整美元
/
调整欧元
基期插值
基期
基期
,对于每个可用的
基期
,形成插值

例如,使用
optimize
-来自
scipy
的包的伪代码:

from scipy import optimize

""" Do this for both 'Adj_EUR' and 'Adj_USD' """

# For 'Adj_EUR'
for curr, df in df_original.groupby('Crncy'):

    x_data = df_interpolation[df_interpolation['CRNCY']==curr].as_matrix(['TENOR'])
    y_data = df_interpolation[df_interpolation['CRNCY']==curr].as_matrix(['Adj_EUR'])

    """ Linear fit """
    z_linear = optimize.curve_fit(lambda t,a,b: a + b * t, x_data.ravel(), y_data.ravel())[0]
    """ Somehow add the values back to df_original in a new column """
    df['Adj_EUR'] = z_linear[0] + z_linear[1] * df['Duration']
屈服

    Crncy  Spread  Duration  Adj_EUR  Adj_USD
0   EUR    100     1.2       12       22
1   nan    nan     nan       0.0      0.0
...
你知道怎么做吗


非常可观

假设我们有
df1
df2

>>> df1
  Crncy  Spread  Duration
0   EUR     100       1.2
1   CHF     200       2.5


>>> df2
  CRNCY  TENOR  Adj_EUR  Adj_USD
0   EUR      1       10       20
1   EUR      2       20       30
2   EUR      5       30       40
3   EUR      7       40       50
4   CHF      1       50       10
5   CHF      2       60       20
6   CHF      5       70       30
df1
df2
转换为类似的数据帧

df1['Adj_EUR'] = np.nan
df1['Adj_USD'] = np.nan
df1['left'] = 1

>>> df1
  Crncy  Spread  Duration  Adj_EUR  Adj_USD  left
0   EUR     100       1.2      NaN      NaN     1
1   CHF     200       2.5      NaN      NaN     1

df2 = df2.rename(columns={'CRNCY': 'Crncy', 'TENOR': 'Duration'})
df2['Spread'] = np.nan
df2['left'] = 0

>>> df2
  Crncy  Duration  Adj_EUR  Adj_USD  Spread  left
0   EUR         1       10       20     NaN     0
1   EUR         2       20       30     NaN     0
2   EUR         5       30       40     NaN     0
3   EUR         7       40       50     NaN     0
4   CHF         1       50       10     NaN     0
5   CHF         2       60       20     NaN     0
6   CHF         5       70       30     NaN     0
现在concat
df1
df2
行方向

df3 = pd.concat([df1, df2], ignore_index=True, sort=False).sort_values(['Crncy', 'Duration'])

>>> df3
  Crncy  Spread  Duration  Adj_EUR  Adj_USD  left
6   CHF     NaN       1.0     50.0     10.0     0
7   CHF     NaN       2.0     60.0     20.0     0
1   CHF   200.0       2.5      NaN      NaN     1
8   CHF     NaN       5.0     70.0     30.0     0
2   EUR     NaN       1.0     10.0     20.0     0
0   EUR   100.0       1.2      NaN      NaN     1
3   EUR     NaN       2.0     20.0     30.0     0
4   EUR     NaN       5.0     30.0     40.0     0
5   EUR     NaN       7.0     40.0     50.0     0
然后使用
Duration
插入每列的
NaN
值,然后删除不必要的列:

df3 = df3.set_index('Duration')
df4 = df3.groupby(['Crncy']).apply(lambda x: x.interpolate(method='index')).reset_index()
df4 = df4[['Crncy', 'Spread', 'Duration', 'Adj_EUR', 'Adj_USD', 'left']]
df4 = df4.loc[df4['left'] == 1].drop('left', axis=1).reset_index(drop=True)

>>> df4
  Crncy  Spread  Duration    Adj_EUR    Adj_USD
0   CHF   200.0       2.5  61.666667  21.666667
1   EUR   100.0       1.2  12.000000  22.000000

希望这能有所帮助。

所以,这更是我想要的:

from scipy import optimize
for curr, df in df_original.groupby('Crncy'):

    x_data = df_interpolation[df_interpolation['CRNCY']==curr].as_matrix(['TENOR'])
    y_data_usd = df_interpolation[df_interpolation['CRNCY']==curr].as_matrix(['Adj_USD'])
    y_data_eur = df_interpolation[df_interpolation['CRNCY']==curr].as_matrix(['Adj_EUR'])

    """ Linear fit """
    if x_data.size > 0:
        z_linear_usd = optimize.curve_fit(lambda t,a,b: a + b * t, x_data.ravel(), y_data_usd.ravel())[0]
        z_linear_eur = optimize.curve_fit(lambda t,a,b: a + b * t, x_data.ravel(), y_data_eur.ravel())[0]

    temp_df = df.copy()[['Crncy','Duration']]
    temp_df['Adj_USD'] = z_linear_usd[0] + z_linear_usd[1] * temp_df['OAD']
    temp_df['Adj_EUR'] = z_linear_eur[0] + z_linear_eur[1] * temp_df['OAD']

    temp_interpolation_lst.append(temp_df)
    del temp_df

temp_interpolation_df = pd.concat(temp_interpolation_lst)
temp_interpolation_df.sort_index(axis=0, inplace=True)

""" Add back to original DataFrame - as the indices are the same and matching..."""
df_original = df_original .join(other=temp_interpolation_df[['Adj_USD', 'Adj_EUR']], how='left')

它不像我希望的那样干净,但似乎仍然有效……

您能提供更多关于如何计算输出值的信息吗?目前还不清楚。@asongtoruin更清楚吗?谢谢