Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/xml/13.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 是否可以在此代码逻辑上使用应用函数或矢量化?_Python_Pandas_Numpy_Vectorization_Apply - Fatal编程技术网

Python 是否可以在此代码逻辑上使用应用函数或矢量化?

Python 是否可以在此代码逻辑上使用应用函数或矢量化?,python,pandas,numpy,vectorization,apply,Python,Pandas,Numpy,Vectorization,Apply,我正在计算期末余额 输入数据帧: open inOut close 0 3 100 0 1 0 300 0 2 0 200 0 3 0 230 0 4 0 150 0 输出数据帧 open inOut close 0 3 100 103 1 103 300 403 2 403 200

我正在计算期末余额

输入数据帧:

    open   inOut    close
0   3      100      0
1   0      300      0
2   0      200      0
3   0      230      0
4   0      150      0
输出数据帧

    open    inOut   close
0   3       100     103
1   103     300     403
2   403     200     603
3   603     230     833
4   833     150     983  
我可以使用roughfor loop来实现这一点,为了优化它,我使用了iterrow()

用于循环

%%timeit
for i in range(len(df.index)):
    if i>0:
        df.iloc[i]['open'] = df.iloc[i-1]['close']
        df.iloc[i]['close'] = df.iloc[i]['open']+df.iloc[i]['inOut']
    else:
        df.iloc[i]['close'] = df.iloc[i]['open']+df.iloc[i]['inOut'] 

1.64 ms ± 51.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%%timeit
for index,row in dfOg.iterrows():
    if index>0:
        row['open'] = dfOg.iloc[index-1]['close']
        row['close'] = row['open']+row['inOut']
    else:
        row['close'] = row['open']+row['inOut']

627 µs ± 28.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
i如箭头所示

%%timeit
for i in range(len(df.index)):
    if i>0:
        df.iloc[i]['open'] = df.iloc[i-1]['close']
        df.iloc[i]['close'] = df.iloc[i]['open']+df.iloc[i]['inOut']
    else:
        df.iloc[i]['close'] = df.iloc[i]['open']+df.iloc[i]['inOut'] 

1.64 ms ± 51.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%%timeit
for index,row in dfOg.iterrows():
    if index>0:
        row['open'] = dfOg.iloc[index-1]['close']
        row['close'] = row['open']+row['inOut']
    else:
        row['close'] = row['open']+row['inOut']

627 µs ± 28.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
性能从1.64ms->627µs优化

因此,我正在努力找出如何使用apply()和矢量化编写上述逻辑。

对于矢量化,我尝试移动列,但无法实现所需的输出

编辑:我更改了周围的内容,以匹配OP对问题所做的编辑

您可以以矢量化的方式执行您想要的操作,而无需任何类似以下的循环:

import pandas as pd

d = {'open': [3] + [0]*4, 'inOut': [100, 300, 200, 230, 150], 'close': [0]*5}
df = pd.DataFrame(d)

df['close'].values[:] = df['open'].values[0] + df['inOut'].values.cumsum()
df['open'].values[1:] = df['close'].values[:-1]
使用
%%timeit进行计时

529 µs ± 5.39 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
输出:

   close  inOut  open
0    103    100     3
1    403    300   103
2    603    200   403
3    833    230   603
4    983    150   833
    open    inOut   close
0   3.0     100     103.0
1   100.0   300     300.0
2   300.0   200     200.0
3   200.0   230     230.0
4   230.0   150     150.0
因此,以这种方式对代码进行矢量化确实要快一些。事实上,它可能会尽可能快。通过对数据帧创建代码计时,您可以看到这一点:

%%timeit
d = {'open': [3] + [0]*4, 'inOut': [100, 300, 200, 230, 150], 'close': [0]*5}
df = pd.DataFrame(d)
结果:

367 µs ± 5.67 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

减去创建数据帧所需的时间,填充数据帧的矢量化版本只需约160µs。

您可以使用
np。其中

%%timeit
df['open'] = np.where(df.index==0, df['open'], df['inOut'].shift())
df['close'] = df['open'] + df['inOut']
# 1.07 ms ± 16.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
输出:

   close  inOut  open
0    103    100     3
1    403    300   103
2    603    200   403
3    833    230   603
4    983    150   833
    open    inOut   close
0   3.0     100     103.0
1   100.0   300     300.0
2   300.0   200     200.0
3   200.0   230     230.0
4   230.0   150     150.0

对不起,我在期末余额逻辑中犯了一个愚蠢的错误。
。apply
不是vectorization@juanpa.arrivillaga是的,我同意,但根据我提到的博客,apply比iterrows()快,你应该使用
itertuples
,apply不会比这更快。请注意,您的
iterrows
版本不起作用,它不会修改原始数据帧谢谢,@juanpa.arrivillaga我也会检查itertuples的性能。这很顺利,但速度很慢。我猜从数组结构来看,
np.where
可以吗?@tel是的,它比你的答案慢了一点,因为
np中有条件检查。where
请重新考虑这个问题。顺便说一句,我喜欢这种简单的方法,但我怀疑这种方法是否适用于计算期末余额。