Python Pandas apply（）自定义函数，使用多个列作为；输入“；_Python_Pandas_Apply

Python Pandas apply（）自定义函数，使用多个列作为；输入“；

python pandas

Python Pandas apply（）自定义函数，使用多个列作为；输入“；,python,pandas,apply,Python,Pandas,Apply,也许看一下这个简单的例子可以帮助您理解我试图做的事情： import pandas as pd df = pd.DataFrame({"A": [10,20,30,50,70,40], "B": [20,30,10,15,20,30]}) def _custom_function(X): # whatever... just for the purpose of the example # but I need X to be

也许看一下这个简单的例子可以帮助您理解我试图做的事情：

import pandas as pd
df = pd.DataFrame({"A": [10,20,30,50,70,40], "B": [20,30,10,15,20,30]})


def _custom_function(X):    
    # whatever... just for the purpose of the example
    # but I need X to be the actual df and not a series

    Y = sum((X['A'] / X['B']) + (0.2 * X['B']))   
    return Y


df['C'] = df.rolling(2).apply(_custom_function, axis=0)

调用自定义函数时，X是Series类型，并且只有df的第一列。是否可以将df trought传递给apply函数

编辑：可以使用滚动（）。应用（）：

import pandas as pd
df = pd.DataFrame({"A": [10,20,30,50,70,40], "B": [20,30,10,15,20,30]})


def _custom_function(X):    
    # whatever... just for the purpose of the example
    Y = sum(0.2 * X)    
    return Y


df['C'] = df['A'].rolling(2).apply(_custom_function)

第二次编辑：滚动列表理解的行为不符合预期

for x in df.rolling(3):
    print(x)

正如您在下面的示例中所看到的，两种方法的输出并不相同：

import pandas as pd
df = pd.DataFrame({"A": [10,20,30,50,70,40], "B": [20,30,10,15,20,30]})
df['C'] = 0.2


def _custom_function_df(X):    
    # whatever... just for the purpose of the example
    # but I need X to be the actual df and not a series
    Y = sum(X['C'] * X['B'])
    return Y

def _custom_function_series(X):    
    # whatever... just for the purpose of the example
    # but I need X to be the actual df and not a series
    Y = sum(0.2 * X)
    return Y


df['result'] = df['B'].rolling(3).apply(_custom_function_series)

df['result2'] = [x.pipe(_custom_function_df) for x in df.rolling(3, min_periods=3)]

列表理解，滚动输出第一行（无预期的NaN），但仅在滚动窗口len（x）=3之后开始正确的滚动。

提前谢谢

将数据帧传递给函数：

df['C'] = _custom_function(df)

或使用：

编辑：按每列单独工作，因此不能在此处使用

可能的解决办法：

df['C'] = [x.pipe(_custom_function) for x in df.rolling(2)]
print (df)
    A   B          C
0  10  20   4.500000
1  20  30  11.166667
2  30  10  11.666667
3  50  15  11.333333
4  70  20  13.833333
5  40  30  14.833333

编辑：如果似乎有错误，默认的

rolling

工作方式为

minu periods=1

以下是解决方案（黑客）：

非常感谢你！我认为pipe（）是我一直在寻找的，因为我的最终目标是做一些类似的事情：df.rolling（n）.pipe（_custom_function）@plonfat-解决方案有问题吗？嗨@jezrael，很遗憾，是的。”滚动“对象”没有属性“管道”。但是，滚动确实具有“应用”属性，但这可以追溯到我最初的问题。@plonfat-在滚动应用中，这是不可能的。刚刚编辑了显示我问题的问题。

df['C'] = [x.pipe(_custom_function) for x in df.rolling(2)]
print (df)
    A   B          C
0  10  20   4.500000
1  20  30  11.166667
2  30  10  11.666667
3  50  15  11.333333
4  70  20  13.833333
5  40  30  14.833333

df['result'] = df['B'].rolling(3).apply(_custom_function_series)

df['result2']=[x.pipe(_custom_function_df) if len(x)==3 else np.nan for x in df.rolling(3)]

print (df)
    A   B    C  result  result2
0  10  20  0.2     NaN      NaN
1  20  30  0.2     NaN      NaN
2  30  10  0.2    12.0     12.0
3  50  15  0.2    11.0     11.0
4  70  20  0.2     9.0      9.0
5  40  30  0.2    13.0     13.0