Python 如何减少pandas中multiple If语句中for循环和df.loc的运行时间?

Python 如何减少pandas中multiple If语句中for循环和df.loc的运行时间?,python,pandas,loops,for-loop,linux-df,Python,Pandas,Loops,For Loop,Linux Df,我正在将一个excel文件转换成如下所示的pandas/python代码 Date Buy Signal Sell Signal 0 13-Apr No No 1 12-Apr No Yes 2 11-Apr Yes No 3 10-Apr Yes Yes 4 9-Apr No No 5 8-Ap

我正在将一个excel文件转换成如下所示的pandas/python代码

      Date Buy Signal Sell Signal
0   13-Apr         No          No
1   12-Apr         No         Yes
2   11-Apr        Yes          No
3   10-Apr        Yes         Yes
4    9-Apr         No          No
5    8-Apr         No         Yes
6    7-Apr        Yes          No
7    6-Apr        Yes         Yes
8    5-Apr         No          No
9    4-Apr         No         Yes
10   3-Apr        Yes          No
11   2-Apr        Yes         Yes
12   1-Apr         No          No
13  31-Mar         No         Yes
14  30-Mar        Yes          No
15  29-Mar        Yes         Yes
16  28-Mar         No          No
17  27-Mar         No         Yes
18  26-Mar        Yes          No
19  25-Mar        Yes         Yes
20  24-Mar         No          No
21  23-Mar         No          No
22  22-Mar         No          No
      Date Buy Signal Sell Signal Own the Stock   Buy  Sell
0   13-Apr         No          No            No  Skip  Skip
1   12-Apr         No         Yes           Yes  Hold  Sell
2   11-Apr        Yes          No           Yes  Hold  Hold
3   10-Apr        Yes         Yes            No   Buy  Skip
4    9-Apr         No          No            No  Skip  Skip
5    8-Apr         No         Yes           Yes  Hold  Sell
6    7-Apr        Yes          No           Yes  Hold  Hold
7    6-Apr        Yes         Yes            No   Buy  Skip
8    5-Apr         No          No            No  Skip  Skip
9    4-Apr         No         Yes           Yes  Hold  Sell
10   3-Apr        Yes          No           Yes  Hold  Hold
11   2-Apr        Yes         Yes            No   Buy  Skip
12   1-Apr         No          No            No  Skip  Skip
13  31-Mar         No         Yes           Yes  Hold  Sell
14  30-Mar        Yes          No           Yes  Hold  Hold
15  29-Mar        Yes         Yes            No   Buy  Skip
16  28-Mar         No          No            No  Skip  Skip
17  27-Mar         No         Yes           Yes  Hold  Sell
18  26-Mar        Yes          No           Yes  Hold  Hold
19  25-Mar        Yes         Yes            No   Buy  Skip
20  24-Mar         No          No            No  Skip  Skip
21  23-Mar         No          No            No  Skip  Skip
22  22-Mar         No          No            No  Skip  Skip
期望的输出是这样的

      Date Buy Signal Sell Signal
0   13-Apr         No          No
1   12-Apr         No         Yes
2   11-Apr        Yes          No
3   10-Apr        Yes         Yes
4    9-Apr         No          No
5    8-Apr         No         Yes
6    7-Apr        Yes          No
7    6-Apr        Yes         Yes
8    5-Apr         No          No
9    4-Apr         No         Yes
10   3-Apr        Yes          No
11   2-Apr        Yes         Yes
12   1-Apr         No          No
13  31-Mar         No         Yes
14  30-Mar        Yes          No
15  29-Mar        Yes         Yes
16  28-Mar         No          No
17  27-Mar         No         Yes
18  26-Mar        Yes          No
19  25-Mar        Yes         Yes
20  24-Mar         No          No
21  23-Mar         No          No
22  22-Mar         No          No
      Date Buy Signal Sell Signal Own the Stock   Buy  Sell
0   13-Apr         No          No            No  Skip  Skip
1   12-Apr         No         Yes           Yes  Hold  Sell
2   11-Apr        Yes          No           Yes  Hold  Hold
3   10-Apr        Yes         Yes            No   Buy  Skip
4    9-Apr         No          No            No  Skip  Skip
5    8-Apr         No         Yes           Yes  Hold  Sell
6    7-Apr        Yes          No           Yes  Hold  Hold
7    6-Apr        Yes         Yes            No   Buy  Skip
8    5-Apr         No          No            No  Skip  Skip
9    4-Apr         No         Yes           Yes  Hold  Sell
10   3-Apr        Yes          No           Yes  Hold  Hold
11   2-Apr        Yes         Yes            No   Buy  Skip
12   1-Apr         No          No            No  Skip  Skip
13  31-Mar         No         Yes           Yes  Hold  Sell
14  30-Mar        Yes          No           Yes  Hold  Hold
15  29-Mar        Yes         Yes            No   Buy  Skip
16  28-Mar         No          No            No  Skip  Skip
17  27-Mar         No         Yes           Yes  Hold  Sell
18  26-Mar        Yes          No           Yes  Hold  Hold
19  25-Mar        Yes         Yes            No   Buy  Skip
20  24-Mar         No          No            No  Skip  Skip
21  23-Mar         No          No            No  Skip  Skip
22  22-Mar         No          No            No  Skip  Skip
基本上,我试图做的是在给定范围内运行这些多个ifs

“拥有股票”列基于以前的“买入”和“卖出”值

如果前一天的“买入”列为“买入”,则今天的“持有股票”列为“是”

如果前一天的“买入”列为“持有”,而“卖出”列为“持有”,则今天的“持有股票”列为“是”

如果前一天的“买入”列为“持有”,而“卖出”列为“卖出”,则今天的“持有股票”列为“否”

如果前一天的“买入”列为“跳过”,则今天的“持有股票”列为“否”

同样,下面的代码分别用于“买入”列和“卖出”列。出现这个问题是因为“拥有股票”列是前一天“买入”和“卖出”值的函数。“买入”和“卖出”列是“持有股票”的现值的函数。这在本质上是递归的,这就是为什么我不能使用.shift(-1)和np.select。或者也许是我执行错误了

这是我的密码:

for i in range(len(df), 0, -1):
    i=i-1
    if i == len(df)-1 :
        df.loc[i,'Own the Stock'] = "No"
        df.loc[i,'Buy'] = "Skip"
        df.loc[i,'Sell'] = "Skip"
    else :
        #'Own the Stock' Column
        if df.loc[i+1,'Buy'] == "Buy" :
            df.loc[i,'Own the Stock'] = "Yes"
        if df.loc[i+1,'Buy'] == "Hold" and df.loc[i+1,'Sell'] == "Hold"  :
            df.loc[i,'Own the Stock'] = "Yes"
        if df.loc[i+1,'Buy'] == "Hold" and df.loc[i+1,'Sell'] == "Sell"  :
            df.loc[i,'Own the Stock'] = "No"
        if df.loc[i+1,'Buy'] == "Skip" :
            df.loc[i,'Own the Stock'] = "No"
        #'Buy' Column
        if df.loc[i,'Buy Signal'] == "No" and df.loc[i,'Own the Stock'] == "No"  :
            df.loc[i,'Buy'] = "Skip"
        if df.loc[i,'Buy Signal'] == "No" and df.loc[i,'Own the Stock'] == "Yes"  :
            df.loc[i,'Buy'] = "Hold"
        if df.loc[i,'Buy Signal'] == "Yes" and df.loc[i,'Own the Stock'] == "No"  :
            df.loc[i,'Buy'] = "Buy"
        if df.loc[i,'Buy Signal'] == "Yes" and df.loc[i,'Own the Stock'] == "Yes"  :
            df.loc[i,'Buy'] = "Hold"           
        #'Sell' Column
        if df.loc[i,'Sell Signal'] == "No" and df.loc[i,'Own the Stock'] == "No"  :
            df.loc[i,'Sell'] = "Skip"
        if df.loc[i,'Sell Signal'] == "No" and df.loc[i,'Own the Stock'] == "Yes"  :
            df.loc[i,'Sell'] = "Hold"
        if df.loc[i,'Sell Signal'] == "Yes" and df.loc[i,'Own the Stock'] == "No"  :
            df.loc[i,'Sell'] = "Skip"
        if df.loc[i,'Sell Signal'] == "Yes" and df.loc[i,'Own the Stock'] == "Yes"  :
            df.loc[i,'Sell'] = "Sell"
虽然很混乱,但代码运行正常。如果行数为数千,则大约为10-20秒。然而,当我试图以几十万的速度运行它时,代码并没有按预期完成

您是否可以推荐cythonized、vectorized或更简单的pandas版本?当我阅读一些文章时,以下是减少运行时间的建议方法。然而,我没有成功地做到这一点


提前谢谢。

请编辑您的问题并将数据框以文本形式放在那里(以便我们可以复制和粘贴它)?您还可以简要描述一下创建新列背后的逻辑吗?看起来您正在对数据帧的每一行执行复杂的操作(ish?)。apply函数可能比您正在做的更好,并且肯定更具可读性。根据操作的逻辑,也可能有将其矢量化的方法。矢量化来自于对逻辑的思考。显然,这可以通过
np.select()
pd.shift()
实现。数据需要是文本,如果按照向量而不是当前循环计数器的绝对引用来定义,逻辑会更好。当我阅读与此类似的问题时,pd.shift()将不适用于递归计算,因为“Own the Stock”列基于“Buy”和“Sell”列的先前值,“买入”和“卖出”列基于“持有股票”列的现值。