从表中的列中查找值并在行中插值(python-scipy)
我遇到了一个程序的问题,该程序将首先在另一个数据帧中查找日期,然后沿行插入某个值 问题: 让原始数据帧如下所示:从表中的列中查找值并在行中插值(python-scipy),python,pandas,scipy,Python,Pandas,Scipy,我遇到了一个程序的问题,该程序将首先在另一个数据帧中查找日期,然后沿行插入某个值 问题: 让原始数据帧如下所示: A = pd.DataFrame({"date":["06/24/2014","06/25/2014","06/26/2014"], "value":[2, 4, 6]}) B = pd.DataFrame({"date":["06/25/2014","06/26/2014","06/24/2014"], "1":[0.1, 0.5, 0.9],"3":[0.2, 0.6, 1.0
A = pd.DataFrame({"date":["06/24/2014","06/25/2014","06/26/2014"], "value":[2, 4, 6]})
B = pd.DataFrame({"date":["06/25/2014","06/26/2014","06/24/2014"], "1":[0.1, 0.5, 0.9],"3":[0.2, 0.6, 1.0],"5":[0.3, 0.7, 1.1],"7":[0.4, 0.8, 1.2]})
A = pd.DataFrame({"date":["06/24/2014","06/25/2014","06/26/2014"], "value":[2, 4, 6], "interp":[0.95,0.25, 0.75]})
其思想是,程序应该首先在B中找到与A匹配的行,并使用列的名称作为x_值,行中的值作为y_值进行插值
输出应如下所示:
A = pd.DataFrame({"date":["06/24/2014","06/25/2014","06/26/2014"], "value":[2, 4, 6]})
B = pd.DataFrame({"date":["06/25/2014","06/26/2014","06/24/2014"], "1":[0.1, 0.5, 0.9],"3":[0.2, 0.6, 1.0],"5":[0.3, 0.7, 1.1],"7":[0.4, 0.8, 1.2]})
A = pd.DataFrame({"date":["06/24/2014","06/25/2014","06/26/2014"], "value":[2, 4, 6], "interp":[0.95,0.25, 0.75]})
我目前的做法是:
import pandas as pd
from scipy.interpolate import interp1d
A = pd.DataFrame({"date":["06/24/2014","06/25/2014","06/26/2014"], "value":[2, 4, 6]})
B = pd.DataFrame({"date":["06/25/2014","06/26/2014","06/24/2014"], "1":[0.1, 0.5, 0.9],"3":[0.2, 0.6, 1.0],"5":[0.3, 0.7, 1.1],"7":[0.4, 0.8, 1.2]})
# Define x as the names of the columns
x_value = (1,3,5,7)
#Define the interpolation function as follows
def interp(row):
idx = B[B['date'] == row['date']].index.tolist()[0] #get indx from B
z_value = [] #get values from row in B
for i in range(1,5):
z_value.append(float(B.iloc[idx][i]))
tuple(z_value)
f_linear = interp1d(x_value,z_value) #define interpolation function
y_il = f_linear(row['value'])
return y_il
最后,我将通过以下方式将函数应用于每一行:
A['interp']=A.apply(interp, axis=1)
我得到以下输出。有更好的方法吗
>>> A
date interp value
0 06/24/2014 0.95 2
1 06/25/2014 0.25 4
2 06/26/2014 0.75 6
我假设要插值的值是B中的“缺失”值不是巧合。如果是这种情况,您根本不需要
a
dataframe
B = pd.DataFrame({"date":["06/25/2014","06/26/2014","06/24/2014"],
"1":[0.1, 0.5, 0.9],"3":[0.2, 0.6, 1.0],"5":[0.3, 0.7, 1.1],"7":[0.4, 0.8, 1.2]})
B = B.set_index('date').T
B.index = B.index.astype(int)
B = B.reindex(range(1,8))
B
所以现在,B
是
date 06/25/2014 06/26/2014 06/24/2014
1 0.1 0.5 0.9
2 NaN NaN NaN
3 0.2 0.6 1.0
4 NaN NaN NaN
5 0.3 0.7 1.1
6 NaN NaN NaN
7 0.4 0.8 1.2
现在我们可以直接申请了
产出:
date 06/25/2014 06/26/2014 06/24/2014
2 0.15 0.55 0.95
4 0.25 0.65 1.05
6 0.35 0.75 1.15
date target interp
0 06/24/2014 2 0.95
1 06/26/2014 6 0.75
2 06/25/2014 4 0.25
如果从那里开始,您必须选择所需的数据。如果您确实只需要选择值,则会将其提供给您。注意,我使用了
groupby
函数,因此我只需创建一个scipy.interpolate.interp1d
调用,每个date
数据模糊不清:
A = pd.DataFrame({"date":["06/24/2014","06/25/2014","06/26/2014"], "value":[2, 4, 6]})
B = pd.DataFrame({"date":["06/25/2014","06/26/2014","06/24/2014"],
"1":[0.1, 0.5, 0.9],"3":[0.2, 0.6, 1.0],"5":[0.3, 0.7, 1.1],"7":[0.4, 0.8, 1.2]})
B = B.set_index('date').T
B.index = B.index.astype(int)
然后是实际工作
from scipy.interpolate import interp1d
import pandas as pd
def interped(series,targets):
x,y = zip(*series.items())
f = interp1d(x,y)
return [(i,f(i)) for i in targets]
def getResults(dfA, dfB):
grouped = dfA.groupby('date')
res = []
for key in grouped.groups:
targets = grouped.get_group(key)['value'].values
values = interped(dfB[key], targets)
res.extend([(key, target, value) for target,value in values])
return pd.DataFrame(res, columns=["date", "target", "interp"])
getResults(A, B)
产出:
date 06/25/2014 06/26/2014 06/24/2014
2 0.15 0.55 0.95
4 0.25 0.65 1.05
6 0.35 0.75 1.15
date target interp
0 06/24/2014 2 0.95
1 06/26/2014 6 0.75
2 06/25/2014 4 0.25
如果你坚持要打电话给A.apply
import pandas as pd
from scipy.interpolate import interp1d
A = pd.DataFrame({"date":["06/24/2014","06/25/2014","06/26/2014"], "value":[2, 4, 6]})
B = pd.DataFrame({"date":["06/25/2014","06/26/2014","06/24/2014"],
"1":[0.1, 0.5, 0.9],"3":[0.2, 0.6, 1.0],"5":[0.3, 0.7, 1.1],"7":[0.4, 0.8, 1.2]})
B = B.set_index('date').T
B.index = B.index.astype(int)
def getRowApplyFunc():
funcs = {}
def interped(row):
date = row['date']
target = row['value']
if date in funcs:
interpFunc = funcs[date]
else:
x,y = zip(*B[date].items())
interpFunc = interp1d(x,y)
funcs[date] = interpFunc
return interpFunc(target)
return interped
A['interpd'] = A.apply(getRowApplyFunc(), axis=1)
A
还输出:
date value interpd
0 06/24/2014 2 0.95
1 06/25/2014 4 0.25
2 06/26/2014 6 0.75
谢谢你的投入,我感谢你的帮助。我想我可以用这个做点什么通常,我的数据帧A有几千行。我想你的解决方案会给我一个有几千列和几千行的数据框谢谢,非常有帮助。你是在暗示使用“应用”不是最好的方法吗?我不得不承认我上面的功能非常慢。