Python 如何减少pandas中multiple If语句中for循环和df.loc的运行时间?
我正在将一个excel文件转换成如下所示的pandas/python代码Python 如何减少pandas中multiple If语句中for循环和df.loc的运行时间?,python,pandas,loops,for-loop,linux-df,Python,Pandas,Loops,For Loop,Linux Df,我正在将一个excel文件转换成如下所示的pandas/python代码 Date Buy Signal Sell Signal 0 13-Apr No No 1 12-Apr No Yes 2 11-Apr Yes No 3 10-Apr Yes Yes 4 9-Apr No No 5 8-Ap
Date Buy Signal Sell Signal
0 13-Apr No No
1 12-Apr No Yes
2 11-Apr Yes No
3 10-Apr Yes Yes
4 9-Apr No No
5 8-Apr No Yes
6 7-Apr Yes No
7 6-Apr Yes Yes
8 5-Apr No No
9 4-Apr No Yes
10 3-Apr Yes No
11 2-Apr Yes Yes
12 1-Apr No No
13 31-Mar No Yes
14 30-Mar Yes No
15 29-Mar Yes Yes
16 28-Mar No No
17 27-Mar No Yes
18 26-Mar Yes No
19 25-Mar Yes Yes
20 24-Mar No No
21 23-Mar No No
22 22-Mar No No
Date Buy Signal Sell Signal Own the Stock Buy Sell
0 13-Apr No No No Skip Skip
1 12-Apr No Yes Yes Hold Sell
2 11-Apr Yes No Yes Hold Hold
3 10-Apr Yes Yes No Buy Skip
4 9-Apr No No No Skip Skip
5 8-Apr No Yes Yes Hold Sell
6 7-Apr Yes No Yes Hold Hold
7 6-Apr Yes Yes No Buy Skip
8 5-Apr No No No Skip Skip
9 4-Apr No Yes Yes Hold Sell
10 3-Apr Yes No Yes Hold Hold
11 2-Apr Yes Yes No Buy Skip
12 1-Apr No No No Skip Skip
13 31-Mar No Yes Yes Hold Sell
14 30-Mar Yes No Yes Hold Hold
15 29-Mar Yes Yes No Buy Skip
16 28-Mar No No No Skip Skip
17 27-Mar No Yes Yes Hold Sell
18 26-Mar Yes No Yes Hold Hold
19 25-Mar Yes Yes No Buy Skip
20 24-Mar No No No Skip Skip
21 23-Mar No No No Skip Skip
22 22-Mar No No No Skip Skip
期望的输出是这样的
Date Buy Signal Sell Signal
0 13-Apr No No
1 12-Apr No Yes
2 11-Apr Yes No
3 10-Apr Yes Yes
4 9-Apr No No
5 8-Apr No Yes
6 7-Apr Yes No
7 6-Apr Yes Yes
8 5-Apr No No
9 4-Apr No Yes
10 3-Apr Yes No
11 2-Apr Yes Yes
12 1-Apr No No
13 31-Mar No Yes
14 30-Mar Yes No
15 29-Mar Yes Yes
16 28-Mar No No
17 27-Mar No Yes
18 26-Mar Yes No
19 25-Mar Yes Yes
20 24-Mar No No
21 23-Mar No No
22 22-Mar No No
Date Buy Signal Sell Signal Own the Stock Buy Sell
0 13-Apr No No No Skip Skip
1 12-Apr No Yes Yes Hold Sell
2 11-Apr Yes No Yes Hold Hold
3 10-Apr Yes Yes No Buy Skip
4 9-Apr No No No Skip Skip
5 8-Apr No Yes Yes Hold Sell
6 7-Apr Yes No Yes Hold Hold
7 6-Apr Yes Yes No Buy Skip
8 5-Apr No No No Skip Skip
9 4-Apr No Yes Yes Hold Sell
10 3-Apr Yes No Yes Hold Hold
11 2-Apr Yes Yes No Buy Skip
12 1-Apr No No No Skip Skip
13 31-Mar No Yes Yes Hold Sell
14 30-Mar Yes No Yes Hold Hold
15 29-Mar Yes Yes No Buy Skip
16 28-Mar No No No Skip Skip
17 27-Mar No Yes Yes Hold Sell
18 26-Mar Yes No Yes Hold Hold
19 25-Mar Yes Yes No Buy Skip
20 24-Mar No No No Skip Skip
21 23-Mar No No No Skip Skip
22 22-Mar No No No Skip Skip
基本上,我试图做的是在给定范围内运行这些多个ifs
“拥有股票”列基于以前的“买入”和“卖出”值
如果前一天的“买入”列为“买入”,则今天的“持有股票”列为“是”
如果前一天的“买入”列为“持有”,而“卖出”列为“持有”,则今天的“持有股票”列为“是”
如果前一天的“买入”列为“持有”,而“卖出”列为“卖出”,则今天的“持有股票”列为“否”
如果前一天的“买入”列为“跳过”,则今天的“持有股票”列为“否”
同样,下面的代码分别用于“买入”列和“卖出”列。出现这个问题是因为“拥有股票”列是前一天“买入”和“卖出”值的函数。“买入”和“卖出”列是“持有股票”的现值的函数。这在本质上是递归的,这就是为什么我不能使用.shift(-1)和np.select。或者也许是我执行错误了
这是我的密码:
for i in range(len(df), 0, -1):
i=i-1
if i == len(df)-1 :
df.loc[i,'Own the Stock'] = "No"
df.loc[i,'Buy'] = "Skip"
df.loc[i,'Sell'] = "Skip"
else :
#'Own the Stock' Column
if df.loc[i+1,'Buy'] == "Buy" :
df.loc[i,'Own the Stock'] = "Yes"
if df.loc[i+1,'Buy'] == "Hold" and df.loc[i+1,'Sell'] == "Hold" :
df.loc[i,'Own the Stock'] = "Yes"
if df.loc[i+1,'Buy'] == "Hold" and df.loc[i+1,'Sell'] == "Sell" :
df.loc[i,'Own the Stock'] = "No"
if df.loc[i+1,'Buy'] == "Skip" :
df.loc[i,'Own the Stock'] = "No"
#'Buy' Column
if df.loc[i,'Buy Signal'] == "No" and df.loc[i,'Own the Stock'] == "No" :
df.loc[i,'Buy'] = "Skip"
if df.loc[i,'Buy Signal'] == "No" and df.loc[i,'Own the Stock'] == "Yes" :
df.loc[i,'Buy'] = "Hold"
if df.loc[i,'Buy Signal'] == "Yes" and df.loc[i,'Own the Stock'] == "No" :
df.loc[i,'Buy'] = "Buy"
if df.loc[i,'Buy Signal'] == "Yes" and df.loc[i,'Own the Stock'] == "Yes" :
df.loc[i,'Buy'] = "Hold"
#'Sell' Column
if df.loc[i,'Sell Signal'] == "No" and df.loc[i,'Own the Stock'] == "No" :
df.loc[i,'Sell'] = "Skip"
if df.loc[i,'Sell Signal'] == "No" and df.loc[i,'Own the Stock'] == "Yes" :
df.loc[i,'Sell'] = "Hold"
if df.loc[i,'Sell Signal'] == "Yes" and df.loc[i,'Own the Stock'] == "No" :
df.loc[i,'Sell'] = "Skip"
if df.loc[i,'Sell Signal'] == "Yes" and df.loc[i,'Own the Stock'] == "Yes" :
df.loc[i,'Sell'] = "Sell"
虽然很混乱,但代码运行正常。如果行数为数千,则大约为10-20秒。然而,当我试图以几十万的速度运行它时,代码并没有按预期完成
您是否可以推荐cythonized、vectorized或更简单的pandas版本?当我阅读一些文章时,以下是减少运行时间的建议方法。然而,我没有成功地做到这一点
提前谢谢。请编辑您的问题并将数据框以文本形式放在那里(以便我们可以复制和粘贴它)?您还可以简要描述一下创建新列背后的逻辑吗?看起来您正在对数据帧的每一行执行复杂的操作(ish?)。apply函数可能比您正在做的更好,并且肯定更具可读性。根据操作的逻辑,也可能有将其矢量化的方法。矢量化来自于对逻辑的思考。显然,这可以通过
np.select()
和pd.shift()
实现。数据需要是文本,如果按照向量而不是当前循环计数器的绝对引用来定义,逻辑会更好。当我阅读与此类似的问题时,pd.shift()将不适用于递归计算,因为“Own the Stock”列基于“Buy”和“Sell”列的先前值,“买入”和“卖出”列基于“持有股票”列的现值。