Python 是否可以在此代码逻辑上使用应用函数或矢量化？_Python_Pandas_Numpy_Vectorization_Apply

Python 是否可以在此代码逻辑上使用应用函数或矢量化？

python pandas numpy

Python 是否可以在此代码逻辑上使用应用函数或矢量化？,python,pandas,numpy,vectorization,apply,Python,Pandas,Numpy,Vectorization,Apply,我正在计算期末余额输入数据帧： open inOut close 0 3 100 0 1 0 300 0 2 0 200 0 3 0 230 0 4 0 150 0 输出数据帧 open inOut close 0 3 100 103 1 103 300 403 2 403 200

我正在计算期末余额

输入数据帧：

    open   inOut    close
0   3      100      0
1   0      300      0
2   0      200      0
3   0      230      0
4   0      150      0

输出数据帧

    open    inOut   close
0   3       100     103
1   103     300     403
2   403     200     603
3   603     230     833
4   833     150     983

我可以使用roughfor loop来实现这一点，为了优化它，我使用了iterrow（）

用于循环

%%timeit
for i in range(len(df.index)):
    if i>0:
        df.iloc[i]['open'] = df.iloc[i-1]['close']
        df.iloc[i]['close'] = df.iloc[i]['open']+df.iloc[i]['inOut']
    else:
        df.iloc[i]['close'] = df.iloc[i]['open']+df.iloc[i]['inOut'] 

1.64 ms ± 51.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%%timeit
for index,row in dfOg.iterrows():
    if index>0:
        row['open'] = dfOg.iloc[index-1]['close']
        row['close'] = row['open']+row['inOut']
    else:
        row['close'] = row['open']+row['inOut']

627 µs ± 28.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

i如箭头所示

%%timeit for i in range(len(df.index)): if i>0: df.iloc[i]['open'] = df.iloc[i-1]['close'] df.iloc[i]['close'] = df.iloc[i]['open']+df.iloc[i]['inOut'] else: df.iloc[i]['close'] = df.iloc[i]['open']+df.iloc[i]['inOut'] 1.64 ms ± 51.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%%timeit for index,row in dfOg.iterrows(): if index>0: row['open'] = dfOg.iloc[index-1]['close'] row['close'] = row['open']+row['inOut'] else: row['close'] = row['open']+row['inOut'] 627 µs ± 28.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
性能从1.64ms->627µs优化
因此，我正在努力找出如何使用apply（）和矢量化编写上述逻辑。

对于矢量化，我尝试移动列，但无法实现所需的输出
编辑：我更改了周围的内容，以匹配OP对问题所做的编辑
您可以以矢量化的方式执行您想要的操作，而无需任何类似以下的循环：

import pandas as pd d = {'open': [3] + [0]*4, 'inOut': [100, 300, 200, 230, 150], 'close': [0]*5} df = pd.DataFrame(d) df['close'].values[:] = df['open'].values[0] + df['inOut'].values.cumsum() df['open'].values[1:] = df['close'].values[:-1]
使用
%%timeit进行计时
：

529 µs ± 5.39 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
输出：

close inOut open 0 103 100 3 1 403 300 103 2 603 200 403 3 833 230 603 4 983 150 833

open inOut close 0 3.0 100 103.0 1 100.0 300 300.0 2 300.0 200 200.0 3 200.0 230 230.0 4 230.0 150 150.0
因此，以这种方式对代码进行矢量化确实要快一些。事实上，它可能会尽可能快。通过对数据帧创建代码计时，您可以看到这一点：

%%timeit d = {'open': [3] + [0]*4, 'inOut': [100, 300, 200, 230, 150], 'close': [0]*5} df = pd.DataFrame(d)
结果:

367 µs ± 5.67 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

减去创建数据帧所需的时间，填充数据帧的矢量化版本只需约160µs。
您可以使用
np。其中 %%timeit df['open'] = np.where(df.index==0, df['open'], df['inOut'].shift()) df['close'] = df['open'] + df['inOut'] # 1.07 ms ± 16.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 输出： close inOut open 0 103 100 3 1 403 300 103 2 603 200 403 3 833 230 603 4 983 150 833 open inOut close 0 3.0 100 103.0 1 100.0 300 300.0 2 300.0 200 200.0 3 200.0 230 230.0 4 230.0 150 150.0 对不起，我在期末余额逻辑中犯了一个愚蠢的错误。。apply 不是vectorization@juanpa.arrivillaga是的，我同意，但根据我提到的博客，apply比iterrows（）快，你应该使用itertuples ，apply不会比这更快。请注意，您的iterrows 版本不起作用，它不会修改原始数据帧谢谢，@juanpa.arrivillaga我也会检查itertuples的性能。这很顺利，但速度很慢。我猜从数组结构来看，np.where 可以吗？@tel是的，它比你的答案慢了一点，因为np中有条件检查。where 请重新考虑这个问题。顺便说一句，我喜欢这种简单的方法，但我怀疑这种方法是否适用于计算期末余额。