使用Numpy和Pandas优化Python代码

使用Numpy和Pandas优化Python代码,python,python-3.x,pandas,numpy,Python,Python 3.x,Pandas,Numpy,我的代码如下: import numpy as np import pandas as pd colum1 = [0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05] colum2 = [1,2,3,4,5,6,7,8,9,10,11,12] colum3 = [0.85,0.80,0.80,0.80,0.85,0.0,0.0,0.0,0.0,0.0,0.0,0.0] colum4 = [1743.85, 1485.58, 12

我的代码如下:

import numpy as np
import pandas as pd
colum1 = [0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05]
colum2 = [1,2,3,4,5,6,7,8,9,10,11,12]
colum3 = [0.85,0.80,0.80,0.80,0.85,0.0,0.0,0.0,0.0,0.0,0.0,0.0]
colum4 = [1743.85, 1485.58, 1250.07, 1021.83, 818.96, 628.05, 455.40, 319.03, 190.86 , 97.07, 26.96 , 0.00]
df = pd.DataFrame({
    'colum1' : colum1,
    'colum2' : colum2,
    'colum3' : colum3,
    'colum4' : colum4,
});

df['result'] = 0
for i in range(len(colum2)):
    df['result'] = np.where(
        df['colum2'] <= 5,
        np.where(
            df['colum2'] == 1,
            df['colum4'],
            np.where(
                ( df['colum4'] - (df['result'].shift(1) * (df['colum1'] * df['colum3'])) )>0,
                ( df['colum4'] - (df['result'].shift(1) * (df['colum1'] * df['colum3'])) ),
                0
            )
        ),
        np.where(
            ( df['colum4'] - (df['result'].shift(1) * df['colum1']) )>0,
            ( df['colum4'] - (df['result'].shift(1) * df['colum1']) ),
            0
        )
    )
将numpy导入为np
作为pd进口熊猫
列1=[0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05]
列2=[1,2,3,4,5,6,7,8,9,10,11,12]
列3=[0.85,0.80,0.80,0.80,0.85,0.0,0.0,0.0,0.0,0.0,0.0]
列4=[1743.851485.581250.071021.83818.96628.05455.40319.03190.8697.07,26.96,0.00]
df=pd.DataFrame({
“colum1”:colum1,
“colum2”:colum2,
“colum3”:colum3,
“colum4”:colum4,
});
df['result']=0
对于范围内的i(len(colum2)):
df['result']=np.where(
df['colum2']0,
(df['column']-(df['result'].shift(1)*(df['column']]*df['column']),
0
)
),
np.在哪里(
(df['colum4']-(df['result'].shift(1)*df['colum1'])大于0,
(df['column4']-(df['result'].shift(1)*df['column1']),
0
)
)
我需要执行相同的操作,而不必求助于for循环。 这将是非常有帮助的,因为我正在处理数以千计的记录,这是非常缓慢的

我的预期结果如下:

colum1 colum2 colum3 colum4 result 0 0.05 1 0.85 1743.85 1743.850000 1 0.05 2 0.80 1485.58 1415.826000 2 0.05 3 0.80 1250.07 1193.436960 3 0.05 4 0.80 1021.83 974.092522 4 0.05 5 0.85 818.96 777.561068 5 0.05 6 0.00 628.05 589.171947 6 0.05 7 0.00 455.40 425.941403 7 0.05 8 0.00 319.03 297.732930 8 0.05 9 0.00 190.86 175.973354 9 0.05 10 0.00 97.07 88.271332 10 0.05 11 0.00 26.96 22.546433 11 0.05 12 0.00 0.00 0.000000 第1列第2列第3列第4列结果 0 0.05 1 0.85 1743.85 1743.850000 1 0.05 2 0.80 1485.58 1415.826000 2 0.05 3 0.80 1250.07 1193.436960 3 0.05 4 0.80 1021.83 974.092522 4 0.05 5 0.85 818.96 777.561068 5 0.05 6 0.00 628.05 589.171947 6 0.05 7 0.00 455.40 425.941403 7 0.05 8 0.00 319.03 297.732930 8 0.05 9 0.00 190.86 175.973354 9 0.05 10 0.00 97.07 88.271332 10 0.05 11 0.00 26.96 22.546433 11 0.05 12 0.00 0.00 0.000000
第一步是删除索引上的循环,并用替换大于0的数字的测试。这是因为
np.where(a>0,a,0)
就我们而言等于
np.max(0,a)

同时,分别定义较长的表达式以使代码可读:

s1 = df['colum4'] - (df['result'].shift(1) * (df['colum1'] * df['colum3']))
s2 = df['colum4'] - (df['result'].shift(1) * df['colum1'])

df['result'] = np.where(df['colum2'] <= 5,
                        np.where(df['colum2'] == 1, df['colum4'],
                                 np.maximum(0, s1)),
                        np.maximum(0, s2))

此版本将更易于管理。

首先为什么要使用循环?如果您删除代码并将循环体带到缩进的外部级别,代码似乎可以工作。最大的问题在于.shift(1):(您对预期结果的信心有多大?我只能复制前几行,然后再复制它们vary@user3483203抱歉,我刚刚纠正了Hanks,这是一个更易于管理的版本,但它并不能解决我消除buble for以获得所需结果的意图。上一个解决方案没有明确的
for
循环,这不是我的意思吗你想要吗?我只是编辑问题,添加所需的结果,但删除buble the resultrado is differents so(只是)一个代码编写服务。我很高兴澄清逻辑、概念、性能类型查询(如果有)。但是“获取这些数字”本身并不是我或其他用户感兴趣的事情。
m1 = df['colum2'] <= 5
m2 = df['colum2'] == 1

conds = [m1 & m2, m1 & ~m2]
choices = [df['colum4'], np.maximum(0, s1)]

df['result'] = np.select(conds, choices, np.maximum(0, s2))