Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/280.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
从表中的列中查找值并在行中插值(python-scipy)_Python_Pandas_Scipy - Fatal编程技术网

从表中的列中查找值并在行中插值(python-scipy)

从表中的列中查找值并在行中插值(python-scipy),python,pandas,scipy,Python,Pandas,Scipy,我遇到了一个程序的问题,该程序将首先在另一个数据帧中查找日期,然后沿行插入某个值 问题: 让原始数据帧如下所示: A = pd.DataFrame({"date":["06/24/2014","06/25/2014","06/26/2014"], "value":[2, 4, 6]}) B = pd.DataFrame({"date":["06/25/2014","06/26/2014","06/24/2014"], "1":[0.1, 0.5, 0.9],"3":[0.2, 0.6, 1.0

我遇到了一个程序的问题,该程序将首先在另一个数据帧中查找日期,然后沿行插入某个值

问题: 让原始数据帧如下所示:

A = pd.DataFrame({"date":["06/24/2014","06/25/2014","06/26/2014"], "value":[2, 4, 6]})

B = pd.DataFrame({"date":["06/25/2014","06/26/2014","06/24/2014"], "1":[0.1, 0.5, 0.9],"3":[0.2, 0.6, 1.0],"5":[0.3, 0.7, 1.1],"7":[0.4, 0.8, 1.2]})
A = pd.DataFrame({"date":["06/24/2014","06/25/2014","06/26/2014"], "value":[2, 4, 6], "interp":[0.95,0.25, 0.75]})
其思想是,程序应该首先在B中找到与A匹配的行,并使用列的名称作为x_值,行中的值作为y_值进行插值

输出应如下所示:

A = pd.DataFrame({"date":["06/24/2014","06/25/2014","06/26/2014"], "value":[2, 4, 6]})

B = pd.DataFrame({"date":["06/25/2014","06/26/2014","06/24/2014"], "1":[0.1, 0.5, 0.9],"3":[0.2, 0.6, 1.0],"5":[0.3, 0.7, 1.1],"7":[0.4, 0.8, 1.2]})
A = pd.DataFrame({"date":["06/24/2014","06/25/2014","06/26/2014"], "value":[2, 4, 6], "interp":[0.95,0.25, 0.75]})

我目前的做法是:

import pandas as pd
from scipy.interpolate import interp1d

A = pd.DataFrame({"date":["06/24/2014","06/25/2014","06/26/2014"], "value":[2, 4, 6]})

B = pd.DataFrame({"date":["06/25/2014","06/26/2014","06/24/2014"], "1":[0.1, 0.5, 0.9],"3":[0.2, 0.6, 1.0],"5":[0.3, 0.7, 1.1],"7":[0.4, 0.8, 1.2]})

# Define x as the names of the columns 
x_value = (1,3,5,7)

#Define the interpolation function as follows

def interp(row):
    idx = B[B['date'] == row['date']].index.tolist()[0] #get indx from B
    z_value = [] #get values from row in B
    for i in range(1,5):
        z_value.append(float(B.iloc[idx][i]))
    tuple(z_value)
    f_linear = interp1d(x_value,z_value) #define interpolation function
    y_il = f_linear(row['value'])
    return y_il
最后,我将通过以下方式将函数应用于每一行:

A['interp']=A.apply(interp, axis=1)
我得到以下输出。有更好的方法吗

>>> A
         date interp  value
0  06/24/2014   0.95      2
1  06/25/2014   0.25      4
2  06/26/2014   0.75      6

我假设要插值的值是B中的“缺失”值不是巧合。如果是这种情况,您根本不需要
a
dataframe

B = pd.DataFrame({"date":["06/25/2014","06/26/2014","06/24/2014"], 
                  "1":[0.1, 0.5, 0.9],"3":[0.2, 0.6, 1.0],"5":[0.3, 0.7, 1.1],"7":[0.4, 0.8, 1.2]})
B = B.set_index('date').T
B.index = B.index.astype(int)
B = B.reindex(range(1,8))
B
所以现在,
B

date    06/25/2014  06/26/2014  06/24/2014
1   0.1     0.5     0.9
2   NaN     NaN     NaN
3   0.2     0.6     1.0
4   NaN     NaN     NaN
5   0.3     0.7     1.1
6   NaN     NaN     NaN
7   0.4     0.8     1.2
现在我们可以直接申请了

产出:

date    06/25/2014  06/26/2014  06/24/2014
2   0.15    0.55    0.95
4   0.25    0.65    1.05
6   0.35    0.75    1.15
    date    target  interp
0   06/24/2014  2   0.95
1   06/26/2014  6   0.75
2   06/25/2014  4   0.25

如果从那里开始,您必须选择所需的数据。

如果您确实只需要选择值,则会将其提供给您。注意,我使用了
groupby
函数,因此我只需创建一个
scipy.interpolate.interp1d
调用,每个
date

数据模糊不清:

A = pd.DataFrame({"date":["06/24/2014","06/25/2014","06/26/2014"], "value":[2, 4, 6]})
B = pd.DataFrame({"date":["06/25/2014","06/26/2014","06/24/2014"], 
                  "1":[0.1, 0.5, 0.9],"3":[0.2, 0.6, 1.0],"5":[0.3, 0.7, 1.1],"7":[0.4, 0.8, 1.2]})
B = B.set_index('date').T
B.index = B.index.astype(int)
然后是实际工作

from scipy.interpolate import interp1d
import pandas as pd

def interped(series,targets):
    x,y = zip(*series.items())
    f = interp1d(x,y)
    return [(i,f(i)) for i in targets]

def getResults(dfA, dfB):
    grouped = dfA.groupby('date')
    res = []
    for key in grouped.groups:
        targets = grouped.get_group(key)['value'].values
        values = interped(dfB[key], targets)
        res.extend([(key, target, value) for target,value in values])

    return pd.DataFrame(res, columns=["date", "target", "interp"])

getResults(A, B)
产出:

date    06/25/2014  06/26/2014  06/24/2014
2   0.15    0.55    0.95
4   0.25    0.65    1.05
6   0.35    0.75    1.15
    date    target  interp
0   06/24/2014  2   0.95
1   06/26/2014  6   0.75
2   06/25/2014  4   0.25

如果你坚持要打电话给A.apply

import pandas as pd
from scipy.interpolate import interp1d

A = pd.DataFrame({"date":["06/24/2014","06/25/2014","06/26/2014"], "value":[2, 4, 6]})
B = pd.DataFrame({"date":["06/25/2014","06/26/2014","06/24/2014"], 
                  "1":[0.1, 0.5, 0.9],"3":[0.2, 0.6, 1.0],"5":[0.3, 0.7, 1.1],"7":[0.4, 0.8, 1.2]})
B = B.set_index('date').T
B.index = B.index.astype(int)


def getRowApplyFunc():
    funcs = {}
    def interped(row):
        date = row['date']
        target = row['value']
        if date in funcs:
            interpFunc = funcs[date]
        else:
            x,y = zip(*B[date].items())
            interpFunc = interp1d(x,y)
            funcs[date] = interpFunc
        return interpFunc(target)
    return interped

A['interpd'] = A.apply(getRowApplyFunc(), axis=1)
A
还输出:

    date    value   interpd
0   06/24/2014  2   0.95
1   06/25/2014  4   0.25
2   06/26/2014  6   0.75

谢谢你的投入,我感谢你的帮助。我想我可以用这个做点什么通常,我的数据帧A有几千行。我想你的解决方案会给我一个有几千列和几千行的数据框谢谢,非常有帮助。你是在暗示使用“应用”不是最好的方法吗?我不得不承认我上面的功能非常慢。